2009年5月31日星期日

QEMU dyngen

Though I've played with QEMU for a while and have done some experiments on the translation process with success, it is still of great importance to get a better and correct understanding about its dynamic translation process. Following is pieces of information from QEMU's official document, the QEMU Internals, and my understanding.

The basic idea is to split every x86 instruction into fewer simpler instructions called micro operations.

I think this idea is similar to Intel's micro instruction, every x86 instruction is first translated/interpreted as a few simple instructions, e.g. the call instruction (0xff) used in my PoC example is translated into:

movtl_T1_im(nextip) //save next ip in T1push_T1 //push next ip onto stackjmp_T0 //jump to the target ip saved in T0

Each simple instruction is implemented by a piece of C code.

Each simple instructions, like push_T1 is implemented by a piece of C code (a simple function in op.h). These code is then compiled into binary code of the target platform. These code snippet is then used to generate the dynamic translator.

QEMU is no more difficult to port than a dynamic linker.

The generated dynamic translator is much like a linker which links the target platform binary code snippet to form a basic block, then executes the block. One interesting thing about the translator is parameter a snippet needs is passed to it at link time through code patching. Following is the example given in its paper: QEMU, a Fast and Portable Dynamic Translator.

From this example we can see, by using runtime patch to pass opration parameters, when write functions for micro operations, we no longer need to worry about the runtime value of parameters. This also makes it possible to directly instrument C code snippet to add function like dynamic taint and target address checking.

P.S. From version 0.10.0, QEMU begin to use TCG (tiny code generator), which brings some differences. I think this may require another blog to describe these changes.