Thursday 17 December 2015

Incorporating and accessing binary data into a C program

The other day I needed to incorporate a large blob of binary data in a C program. One simple way is to use xxd, for example, on the binary data in file "blob", one can do:

xxd --include blob 

 unsigned char blob[] = {  
  0xc8, 0xe5, 0x54, 0xee, 0x8f, 0xd7, 0x9f, 0x18, 0x9a, 0x63, 0x87, 0xbb,  
  0x12, 0xe4, 0x04, 0x0f, 0xa7, 0xb6, 0x16, 0xd0, 0x70, 0x06, 0xbc, 0x57,  
  0x4b, 0xaf, 0xae, 0xa2, 0xf2, 0x6b, 0xf4, 0xc6, 0xb1, 0xaa, 0x93, 0xf2,  
  0x12, 0x39, 0x19, 0xee, 0x7c, 0x59, 0x03, 0x81, 0xae, 0xd3, 0x28, 0x89,  
  0x05, 0x7c, 0x4e, 0x8b, 0xe5, 0x98, 0x35, 0xe8, 0xab, 0x2c, 0x7b, 0xd7,  
  0xf9, 0x2e, 0xba, 0x01, 0xd4, 0xd9, 0x2e, 0x86, 0xb8, 0xef, 0x41, 0xf8,  
  0x8e, 0x10, 0x36, 0x46, 0x82, 0xc4, 0x38, 0x17, 0x2e, 0x1c, 0xc9, 0x1f,  
  0x3d, 0x1c, 0x51, 0x0b, 0xc9, 0x5f, 0xa7, 0xa4, 0xdc, 0x95, 0x35, 0xaa,  
  0xdb, 0x51, 0xf6, 0x75, 0x52, 0xc3, 0x4e, 0x92, 0x27, 0x01, 0x69, 0x4c,  
  0xc1, 0xf0, 0x70, 0x32, 0xf2, 0xb1, 0x87, 0x69, 0xb4, 0xf3, 0x7f, 0x3b,  
  0x53, 0xfd, 0xc9, 0xd7, 0x8b, 0xc3, 0x08, 0x8f  
 unsigned int blob_len = 128;  

..and redirecting the output from xxd into a C source and compiling this simple and easy to do.

However, for large binary blobs, the C source can be huge, so an alternative way is to use the linker ld as follows:

ld -s -r -b binary -o blob.o blob  

...and this generates the blob.o object code. To reference the data in a program one needs to determine the symbol names of the start, end and perhaps the length too. One can use objdump to find this as follows:

 objdump -t blob.o  
 blob.o:   file format elf64-x86-64  
 0000000000000000 l  d .data        0000000000000000 .data  
 0000000000000080 g    .data        0000000000000000 _binary_blob_end  
 0000000000000000 g    .data        0000000000000000 _binary_blob_start  
 0000000000000080 g    *ABS*        0000000000000000 _binary_blob_size  

To access the data in C, use something like the following:

 cat test.c  
 #include <stdio.h>  
 int main(void)  
         extern void *_binary_blob_start, *_binary_blob_end;  
         void *start = &_binary_blob_start,  
            *end = &_binary_blob_end;  
         printf("Data: %p..%p (%zu bytes)\n",   
                 start, end, end - start);  
         return 0;  

...and link and run as follows:

 gcc test.c blob.o -o test  
 Data: 0x601038..0x6010b8 (128 bytes)  

So for large blobs, I personally favour using ld to do the hard work for me since I don't need another tool (such as xxd) and it removes the need to convert a blob into C and then compile this.


  1. I didn't know ld could do this.
    Console homebrew developers use a tool called bin2o.

  2. When the data gets too large for the .o you can zlib compress it > xxd, and uncompress it at runtime too. Tiny binaries with binary. :D

  3. You can do it differently - without running any external tool. No ld, no xxd. Just inline asm:

  4. You can also use the _binary_NAME_size instead of computing the difference between start and end pointers.
    And there is no need for "-s" on the ld(1) call -- there is nothing to strip.

  5. IF anyone is curious, doing this in java is only possible if the resulting java file is no bigger than 65kb and does not include more than 65k literals / constants.

  6. Another alternative is to use objcopy like this:

    objcopy -I binary -O binary blob blob.o

  7. It's unnecessary in Java because resources can live happily in the jar file.

  8. ..or using just the core utilities, here is a one-liner I figured out earlier this week:

    echo "const unsigned char binary_blob[] ={" $(od -tx1 -An -v < binary.blob | sed -e 's/[0-9a-f][0-9a-f]/0x&,/g' -e '$ s/.$//') "};"

  9. It's interesting how the same idea surfaces in
    different forms. Here's how Solaris does it:

  10. The objcopy method that works for x86-64 is:

    objcopy -I binary -O elf64-x86-64 -B i386:x86-64 blob blob.o