Sunday, 11 January 2009

On ABI and API compatibility

If you are going to create a new execution environment, such as a new OS, it can make things simpler if it presents the same interfaces as an existing system. It makes porting easier. So,

If you can, keep the ABI the same.

If you can't keep the ABI the same, at least keep the API the same.

Don't be tempted to say "we're changing X; we may as well take this opportunity to change Y, which has always bugged me". Only change things if there is a good reason.

For an example, let's look at the case of GNU Hurd.

In principle, the Hurd's glibc could present the same ABI as Linux's glibc (they share the same codebase, after all), but partly because of a different in threading libraries, they were made incompatible. Unifying the ABIs was planned, but it appears that 10 years later it has not happened (Hurd has a libc0.3 package instead of libc6).

Using the same ABI would have meant that the same executables would work on Linux and the Hurd. Debian would not have needed to rebuild all its packages for a separate "hurd-i386" architecture. It would have saved a lot of effort.

I suspect that if glibc were ported to the Hurd today, it would not be hard to make the ABIs the same. The threading code has changed a lot in the intervening time. I think it is cleaner now.

The Hurd's glibc also changed the API: they decided not to define PATH_MAX. The idea was that if there was a program that used fixed-length buffers for storing filenames, you'd be forced to fix it. Well, that wasn't a good idea. It just created unnecessary work. Hurd developers and users had enough on their plates without having to fix unrelated implementation quality issues in programs they wanted to use.

Similarly, it would help if glibc for NaCl (Native Client) could have the same ABI as glibc for Linux. Programs have to be recompiled for NaCl with nacl-gcc to meet the requirements of NaCl's validator, but could the resulting code still run directly on Linux, linked with the regular i386 Linux glibc and other libraries?
The problem here is that code compiled for NaCl will expect all code addresses to be 32-byte-aligned. If the NaCl code is given a function address or a return address which is not 32-byte-aligned, things will go badly wrong when it tries to jump to it.
Some background: In normal x86 code, to return from a function you write this:

ret

This pops an address off the stack and jumps to it. In NaCl, this becomes:

popl %ecx
and $0xffffffe0, %ecx
jmp *%ecx

This pops an address off the stack, rounds it down to the nearest 32 byte boundary and jumps to it. If the calling function's call instruction was not placed at the end of a 32 byte block (which NaCl's assembler will arrange), the return address will not be aligned and this code will jump to the wrong location.

However, there is a way around this. We can get the NaCl assembler and linker to keep a list of all the places where a forcible alignment instruction (the and $0xffffffe0, %ecx above) was inserted, and put this list into the executable or library in a special section or segment. Then when we want to run the executable or library directly on Linux, we can rewrite all these locations so that the sequence above becomes

popl %ecx
nop
nop
jmp *%ecx

or maybe even just

ret
nop
nop
nop
nop
nop

We can reuse the relocations mechanism to store these rewrites. The crafty old linker already does something similar for thread-local variable accesses. When it knows that a thread-local variable is being accessed from the library where it is defined, it can rewrite the general-purpose-but-slow instruction sequence for TLS variable access into a faster instruction sequence. The general purpose instruction sequence even contains nops to allow for rewriting to the slightly-longer fast sequence.

This arrangement for running NaCl-compiled code could significantly simplify the process of building and testing code when porting it to NaCl. It can help us avoid the difficulties associated with cross-compiling.