VMware didn't
wait for Intel or AMD to solve the "x86 stealth instructions" problem
and launched their solution at the end of the previous century (1999). To uncloak
the stealthy x86 instructions, VMware used Binary translation (unfortunately, a Tachyon detection grid proved too expensive)
. VMware's Binary Translation is a lot lighter than the binary translation technology
that the Intel Itanium (x86 to IA64), Transmeta (x86 to VLIW), Digital FX!32 (Alpha
to x86), or Rosetta software use. It doesn't have to translate from one Instruction
Set Architecture (ISA) to another but it is based on an x86 to x86 translator. In
fact, in some cases it just makes an exact copy of the original instruction.

VMware translates
the binary code that the kernel of a guest OS wants to execute on the fly and stores
the adapted x86 code in a Translator Cache (TC). User applications will not be touched
by VMware's Binary Translator (BT) as it knows/assumes that user code is safe. User
mode applications are executed directly as if they were running natively.

User applications are
not translated, but run directly. Binary Translation only happens when the
guest OS kernel gets called.

It is the kernel
code that has go through the "x86 to slightly longer x86" code translation.
You could say that the kernel of the guest OS is no longer running. The kernel code
in the memory is nothing more than an input for the BT; it is the BT translated
kernel that will run in ring 1.

In many cases,
the translated kernel code will be an exact copy. However, there are several cases
where BT must make the "translated" kernel code a bit longer than the
original code. If the kernel of the guest OS has to run a privileged instruction,
the BT will change this kind of code into "safer" user mode code. If the
kernel needs to get control of the physical hardware, the BT will replace that binary
code with code that manipulates the virtual hardware.

Binary translation
is all about scanning the code that the kernel of the guest OS should execute at
a certain moment in time and replacing it with something safe (virtualized) on the
fly. With "safe", we mean safe for the other guest OSes and the VMM. VMware
also keeps the overhead of the translation as low as possible. The BT does not optimize
the binary instruction stream, and an instruction stream that has been translated
is kept in a cache. In case of a loop, this means that the translation is done only
once.

The TC is not
only a Translator Cache but also a bit of a Trace Cache as it keeps track of the
control flow of the program. Each time the kernel jumps to another address location,
the BT cannot copy this exactly. If the original code had to jump 100 bytes for
example, it is very unlikely that the translated part of the kernel in the TC has
to jump the same number of bytes. The BT has probably lengthened the "in between"
code a bit.

It is clear that
replacing code with "safer" code is a lot less costly than letting privileged
instructions result in traps and then handling those traps afterwards. Nevertheless,
that doesn't mean that the overhead of this kind of virtualization is always low.
The "Translator overhead" is rather low, and its impact gets lower and
lower over time, courtesy of the Translator cache. However, BT cannot
completely crack several hard nuts: