Playing with Binary Formats

This article explains how kernel modules can add new binary formats to a system and show a pair of examples.

A Sample Implementation: Displaying
Images

To turn the theory into sound practice, let's try to expand
our bluff into
bloom (Binary Loader for
Outrageously Ostentatious Modules). The complete source of the new
module is distributed together with bluff.

The role of bloom is to display executable images. Give
execution permission to your GIF images and load the module, then
call your image like it was a command, and
xv will display it.

This code is neither particularly original (most of it comes
from binfmt_script.c) nor particularly smart (text-only people like
me would rather use an ASCII viewer, for instance, and other people
prefer a different viewer). I feel this kind of example is quite
didactic anyway, and it can be easily run by anyone who can run an
X server and has root access to the computer in order to load
modules.

The source file is made up of little more than 50 lines and
is able to execute GIF, TIFF and the various PBM formats; needless
to say, you must give your images execute permissions
(chmod +x) in advance. The viewer is
configurable at load time and defaults to /usr/X11R6/bin/xv. Here
is a sample session copied from my text console:

The next question that I hear you ask is “How can I set up
things so that kerneld can
automatically load my module?”

Well, actually it isn't always possible. The code in
fs/exec.c only tries to use kerneld when at least one of the first
four bytes is not printable. This behaviour is meant to avoid
losing too much time with kerneld when the file being executed is a
text file without the #! line. While real binary
formats have one non-printable byte in the first four, this isn't
always true for generic data types.

The net result of this behaviour is that you can't
automatically load the bloom viewer when invoking a GIF file or
when calling a PBM file by name. Both formats begin with a text
string and will therefore be ignored by the auto-loader.

When, on the other hand, the file has a non-printing
character within the first four, the kernel issues a kerneld
request for binfmt-number, where the exact string is generated by
this statement:

sprintf(modname, "binfmt-%hd",
*(short*)(&bprm->buf));

The ID of the binary format generated by the above statement
represents the first two bytes of the disk file. If you try to
execute TIFF files, kerneld looks for
binfmt-19789 or binfmt-18761.
A gzipped file calls for binfmt--29921
(negative). GIF files, on the other hand, are passed to /bin/sh
shell due to their leading text string. If you want to know the
number associated with each binary format, look in the
/usr/lib/magic file and convert the values to decimal.
Alternatively, you can pass the debug argument to kerneld and look
at its messages when you execute your data files and it tries to
load the corresponding binary format.

It's interesting to note that kernel versions 2.1.23 and
newer switched to an easier and more significant ID by using the
following line:

sprintf(modname, "binfmt-%04x",
*(unsigned short *)(&bprm->buf[2]));

This new ID string represents the third and fourth byte of
the binary file and is hexadecimal instead of decimal (thus leading
to strings with a better format and no ugly “minus-minus”
appearing now and then.

What's This Worth?

While calling images by name can be funny, it has no real
role in a computer system. I personally prefer calling my viewer by
name, and I do not believe in the object-orientedness of the
approach. This kind of feature in my opinion is best suited to the
file manager where it can be tailored by appropriate configuration
files without introducing kernel bloat to lie in the way of any
computational path.

What is really interesting about binary
formats is the ability to run program files that don't fall in the
handy #! notation. This includes executable
files belonging to other operating systems or platforms, as well as
interpreted languages that have not been designed for the Unix
operating system—all those languages that complain about a
#! in the first line.

If you want to play one such game, you can try the
fail module. This “Format for
Automatically Interpreting Lisp” is a wrapper to invoke Emacs any
time a byte-compiled e-lisp program is invoked by name. Such
practice is definitely failure-prone, as it makes little sense to
invoke several megabytes of program code to run a few lines of
lisp. Moreover, Emacs-lisp is not suited to command-line handling.
Together with fail you'll also
find a pair of sample lisp executables to make your tests.

A real-world Linux system is full of interesting examples of
interpreted binary formats such as the Java binary format. Other
examples are the binary format that allows the Alpha platform to
run Linux-x86 binaries and the one included in recent DOSEMU
distributions that is able to run old DOS programs transparently
(although the program must be specifically tailored in
advance).

Version 2.1.43 of the kernel and newer ones include generic
support for interpreted binary formats.
binfmt_misc is somewhat like bloom
but much more powerful. You can add new interpreted binary formats
to the module by writing the relevant information to the file
/proc/sys/fs/binfmt_misc.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.