C ABI Hacks

The C standard describes how a C program will work given an initial set of source files to be compiled into an executable. To do this, it has a description of an "abstract machine" in which the program could be thought of running on. The success of the C programming language is due to the fact that the abstract machine description is very close to how modern computers actually work. The result is that individual C statements tend to compile down into very few assembly instructions, allowing a huge amount of control for the programmer.

An interesting fact is that real C programs do not run on abstract machines, they run on real machines under real operating systems. This means that if we go beyond the C standard, we can detect and use this fact. In effect, if we invoke "undefined" or "implementation defined" behaviour the details of the underlying implementation become important. Going beyond the C standard has its drawbacks, the resulting program could legally do anything. For example, early versions of the gcc compiler would try to invoke the rogue computer game if they encountered a #pragma statement during compilation.

Fortunately for us, there exist other standards that describe how machines implement the abstract machine that C programs run on. The most important of these is the Application Binary Interface or ABI. This details how arguments are passed to, and received from functions. It allows libraries and programs compiled with different compilers to talk to each other without problems. Thus if we invoke behaviour undefined or implementation defined by the C standard, but defined by the ABI, the results will still be consistent across all compilers for a given machine architecture. The results will actually be quite portable if the architecture chosen is common.

For this article, we chose the System V ABI for AMD64 machines. This is used by the vast majority of unix-based systems in 64bit mode on commodity hardware. Note that the 64bit Microsoft Windows ABI is similar, so the changes required to get the code working in this article should be relatively small on those machines.

The main detail for the 64bit ABI is that most arguments to functions are passed and returned in registers. This allows a carefully crafted C program to actually read/write to individual asm registers! Normally, that trick would require assembly source code.

Reading and Writing Registers from C

The first set of registers we will access are the integer registers used to pass parameters to a function. The ABI specifies that %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are used in turn for the first six integer or pointer arguments. Thus, if we define a function that reads these integers, we can see what these registers contain. The trick here is that when we call the read functions, we need to call them without parameters so that we don't overwrite the register values we want to obtain. This requires compiling the read functions in a separate compilation unit from whatever uses them. (It also requires turning off link-time optimization to prevent problems.)

Writing values is a little trickier. Here, we need to make sure we don't alter registers that we don't want changed. To do that, we need to read them all, and then just change the ones we want. Thus we need something like:

Another integer register we may modify is %rax. This is designated as the standard return register for integer arguments. Thus to set it, we just need to return something. To read it, we need to read the return value of a function that doesn't actually return anything.

void read_rax(void)
{
}
u64b write_rax(u64b p1)
{
return p1;
}

Again, the calling definitions are different from the declarations:

u64b read_rax(void);
void write_rax(u64b);

Another register that is useful to read is the stack pointer. Here, we need to think laterally because the obvious trick of returning an offset from a local variable doesn't quite work. The compiler knows that is undefined behaviour, and doesn't generate the code we want. However, it turns out that long double arguments are always passed as function arguments on the stack. Since the address of the first stack argument is the original value of the stack pointer before the function call, we can just return that.

void *read_rsp(long double x)
{
return &x;
}

The corresponding definition uses a void * instead of a unsigned long long since the stack is more useful as a pointer.

void *read_rsp(void);

Unfortunately, although it is possible to change the value of the stack pointer, it isn't possible to do it cleanly. If the stack pointer is altered in C (say by allocating a variable sized array, or calling a function), the compiler will always generate code to restore the stack once the allocated space is no longer needed. The problem is that the ABI doesn't specify how this restoration procedure is done. Thus overriding it in a portable manner isn't possible.

The remaining alterable register is the instruction pointer, %rip. This can be inspected by looking at the return address of a function on its stack. By changing that address, can get the function to return anywhere you like. This technique is used by some malware to overcome the limitations of a non-executable stack. However, the problem with using this trick in a normal program is that the ABI specifies that several registers should be unchanged across function calls. Using C alone, it is impossible to read or write to these registers in the required way. (Note that they can be saved and restored via the jmp_buf type and the longjmp and siglongjmp functions. However, glibc now encrypts the stored register values for security reasons.)

The next set of registers we can inspect are the sse floating point registers. The first eight of these are used to pass float, double and __m128 arguments to functions. By using the same tricks as we used for integer registers, we may thus read and write to them from C code.

Finally, it is also possible to inspect the first two registers on the old X87 floating point stack. These are used to return long double and complex long double arguments from a function. Unfortunately, we can't read the second item on the stack separately, only together with the first. The corresponding code to do this is:

The above functions allow us to read and write to many of the registers on a 64bit unix machine directly from C. No assembler source code, or inline assembly is required. However, real implementations do have easy access to assemblers, so the above is mostly for academic interest.

C to C Foreign Function Interface

There is one good use for the code in the previous section though. We can use it to do the impossible, get C to dynamically at run time build the arguments to call arbitrary C functions. The reverse task is relatively easy. The C standard describes the va_args interface which allows a C function to parse any arguments sent to it. The converse problem is typically solved by passing a va_list to a variant of the function you want to call. Thus requiring the existence of "v" functions like vsprintf, vfprintf etc.

The problem is that a va_list version of the function you want to dynamically call may not exist. Unfortunately, the C standard provides no way to surmount this limitation. However, nonstandard approaches exist. The best known is libffi, the library of foreign function interfaces which allows different programming languages to call each other. It, however, uses quite a large amount of assembly code to work its magic. Instead, we will provide a pure C version for the limited case of C calling C on the 64bit SysV ABI.

The first problem we need to tackle is the description of user defined types. To do this, we need some sort of opaque structure to store the information required by the ffi library. A user may then call functions to create and destroy such structures, together with definitions for all the default C types. Such an interface may look like:

The opaque type_struct needs to store the information for the number and type of registers used to pass something of the type it describes. It also needs to have the information to describe new compound types containing itself in further calls to type_struct_create. Some code that does this is:

Where the "C_XXX" macros describe the different register types required by the ABI, and this together with the offset and size are stored. Note that since the maximal size of compound objects passed with registers is sixteen bytes, only a total of two registers are ever needed. One final problem is that since the algorithm in the ABI for describing what is passed where is rather complex, the above may still contain bugs. It has been tested with a few simple cases, but unknown problems may unfortunately still exist for more complex types.

The next problem is creating a function that uses such type descriptors to call an arbitrary function returning void. (The lack of a return value simplifies things a bit.) However, before we can do this, we need a few more C-assembly interface functions defined. Basically, since we will be setting many registers simultaneously, it is more efficient to do that with one function call, rather than many. In addition, we need to pay special care with the %rax register. Calling functions that may use va_start requires storing the number of arguments passed in sse registers in %rax. The most obvious way of setting %rax may be used if we don't have further items to pass on the stack. If the stack is also used, we need to get the compiler to set %rax for us in another way. The simplest way for that is to declare variable argument functions with the correct number of sse arguments. Thus the functions we need to write have the following interfaces:

Using the above, it is possible to now construct a function to call any function returning void. We first scan the arguments to see how many registers are required of each type, and how much stack space is required for any remaining arguments. We then allocate the required amount of stack space, and store the stack arguments at the correct offsets mandated by the ABI. Note that since we don't have direct access to the stack pointer in C, we need to use a hack via the alloca() function to make sure enough room exists. Finally, we use the C-asm hacks to store the arguments in the correct registers and call the function.

Handling a function that returns a value is slightly more complex. For example, if the return value is larger than sixteen bytes, a hidden first parameter is passed that is a pointer to a location to store it. The rest of the logic is the same as the above provided that the argument list is altered to reflect his new pointer.

Other cases may be enumerated, where we need to handle each possible case of integer, SSE and Floating point Stack register pair as a return value. By using memcpy() we can copy into the objects pointed to by the return argument pointer. (A simple store will not work because i.e. a char return will appear in %rax, which is eight times as large. We don't want to stomp on other data with a write that is too large.)

Conclusion

The result is a FFI library written purely in C. We've used undefined and implementation defined behaviour to set the required registers to get argument passing to work. Such an implementation isn't as efficient as the assembly based version in libffi, but shows how powerful the C programming language actually is. The fact that it compiles down to a predictable set of assembly instructions allow us to use it in unintended ways.

Finally, here is some example code using the above C to C ffi library. (A .tar.gz of the library code is in the downloads directory.)