Embedding a File in an Executable, aka Hello World, Version 5967

I recently had the need to embed a file in an executable.
Since I'm working at the command line with gcc, et al and not with
a fancy RAD tool that makes it all happen magically it wasn't immediately
obvious to me how to make this happen.
A bit of searching on the net found a hack to essentially cat it onto
the end of the executable and then decipher where it was based on a
bunch of information I didn't want to know about.
Seemed like there ought to be a better way...

And there is, it's objcopy to the rescue.
objcopy converts object files or executables from one format to another.
One of the formats it understands is "binary", which is basicly
any file that's not in one of the other formats that it understands.
So you've probably envisioned the idea:
convert the file that we want to embed into an
object file, then it can simply be linked in with the rest of our code.

Let's say we have a file name data.txt that
we want to embed in our executable:

# cat data.txt
Hello world

To convert this into an object file that we can link with our program
we just use objcopy to produce a ".o" file:

This tells objcopy that our input file is in the "binary" format,
that our output file should be in the "elf32-i386" format (object files on the x86).
The --binary-architecture option tells objcopy that the
output file is meant to "run" on an x86. This is needed so that ld
will accept the file for linking with other files for the x86.
One would think that specifying the output format as "elf32-i386" would imply this,
but it does not.

Now that we have an object file we only need to include it when
we run the linker:

# gcc main.c data.o

When we run the result we get the prayed for output:

# ./a.out
Hello world

Of course, I haven't told the whole story yet, nor shown you main.c.
When objcopy does the above conversion it adds some "linker" symbols to the
converted object file:

_binary_data_txt_start
_binary_data_txt_end

After linking, these symbols specify the start and
end of the embedded file.
The symbol names are formed by prepending _binary_ and appending _start or _end
to the file name.
If the file name contains any characters that would be invalid in a symbol name
they are converted to underscores (eg data.txt becomes data_txt).
If you get unresolved names when linking using these symbols,
do a hexdump -C on the object file and look at the end of the dump for the names
that objcopy chose.

The code to actually use the embedded file should now be reasonably obvious:

One important and subtle thing to note is that the symbols added to
the object file aren't "variables".
They don't contain any data, rather, their address is their value.
I declare them as type char because it's convenient for this
example: the embedded data is character data.
However, you could declare them as anything,
as int if the data is an array of integers,
or as struct foo_bar_t if the data were any array of foo bars.
If the embedded data is not uniform, then char is probably
the most convenient: take its address and cast the pointer to the proper type
as you traverse the data.

The version number is the version of the "hello world" program, not the article. And could somebody please come up with a new standard first program. If I see "hello world" in one more language I'm gonna spit-up :).

I was facing exactly the same problem when I wanted to embed 4tH bytecode into an executable. The trick is to convert the file into a C-file that can be compiled properly with any C compiler. 4tH features a program to do that. In essence it works like this: you read the file in binary mode byte by byte and convert those bytes to unsigned characters. A converted file looks like this:

'unit' is equivalent to 'unsigned char'. You can even embed several files like this. IMHO this method is more transparent to both the programmer and the compiler. The source to do this is pretty trivial:

Using objcopy does this without the extra compilation step,
although using the result is a bit more obscure.
The other thing I like about using objcopy is that it doesn't
leave a "temporary" ".c" file sitting around. Makes me nervous deleting ".c" files.

PS Try this, the hexdump command looks freaky but it actually does work!

That is one of the most interesting things I have ever seen in this magazine. It's almost an introduction to how a linker works. It would be really excellent to expand upon this article, although I'm not expert enough to suggest in what way.

(Thank you for the initial code that got me started.)
I turned the code into a macro, got rid of the global data_end and replaced it with data_len. You could go one big step forward and create a common header file containing the assembly and C macros. It could also contain a macro for C++. Then, just ifdef the macros based on the compiler flags. Then, you can just #include the same file, I think, in many places.

Its converting a data file, of any type of data, into text that is valid assembly language. The resulting output could then be passed to the assembler and "assembled" (ie compiled by the assembler) into an object file.

Some of the other comments mention converting it to C and then compiling the C, this is the same idea only the target language is assembly language and not C.

The linux assembler is a program invoked with the command "as", it is sometimes referred to as "gas" for the GNU Assembler.