HP's Dynamo

Transmeta recently made headlines with
the Code Morphing capabilities of their new
processor, Crusoe. In Transmeta-speak, Code Morphing refers to the translation, at run-time, of a program in one instruction set (x86, for
Transmeta's Crusoe) into the native instruction set of the host
processor. Along with ISA translation, the Code Morphing layer also
profiles and aggressively optimizes the most-used parts of the translated
code. (For more on this, see Hannibal's Crusoe tech article, Crusoe
Explored.)

Although Transmeta has certainly managed to garner the most intense press
coverage, a number of other CPU technologies are making (or will make) extensive use of
on-the-fly translation from one binary format to another. Modern x86 processors
are "RISC-like" at their core, translating x86 instructions into simpler
instructions as they proceed. Intel's 64-bit Itanium will use some form of
translation to execute existing
32-bit x86 and PA-RISC binaries. Finally, many Java virtual machines "interpret" Java
bytecode by first translating the bytecode to native code, then executing
natively. Sun's upcoming MAJC CPU architecture will make even more
extensive use of Just In Time (JIT) compilation than Java does today.

Last week I attended a talk at MIT that underscored the increasing use of
dynamic translation software as an interface between new processors and legacy
binaries. Vasanth Bala from HP Labs in Cambridge (Massachusetts, not England),
gave an overview of Dynamo, a "prototype
software dynamic optimizer that transparently accelerates native binaries at
runtime". That sounds quite different from the sort of translators
mentioned above, but you'll find they're really the same thing. Even better, the architecture of Dynamo is remarkably similar to
Transmeta's Code Morphing, which isn't surprising, because Transmeta hired away
a number of researchers that were working on Dynamo. The goal of this article is
twofold: to describe some pretty remarkable technology (Dynamo), and to explain
how it might relate to Crusoe's code morphing and the future of microprocessors.

What Dynamo does

Dynamo is an odd beast. It is, in essence, an interpreter for HP's PA-8000
instruction set that itself runs on a PA-8000 processor. That's right -- it interprets
programs that could just as easily be executed natively on the same hardware.
For a research prototype, this isn't as strange as it seems. The Dynamo project
was started to investigate issues in what was seen as an increasingly important
area -- dynamic translation of non-native binaries to native code. For that
purpose it doesn't really matter if the original binaries are non-native or not,
only that, whatever they are, they're read into some internal form, munged, and
spit back out for native execution. The question is only, "How can this
translation be efficient, both in time and space?" What's surprising is that Dynamo "inadvertently" became practical. Programs "interpreted" by Dynamo are
often faster than if they were run natively. Sometimes by 20% or more.

To understand how programs can run faster under Dynamo than by running
natively, let's look at how Dynamo works. We'll first go through a quick overview
of what Dynamo does, before we go into a step-by-step explanation of it.

Dynamo is a program that,
unlike Crusoe's Code Morphing Layer, runs in user mode. So Dynamo runs on
top of the OS like any other application. Its job is to take portions of a binary
executable, translate them into optimized fragments, and run them
natively. This sounds quite similar to Transmeta's Code Morphing, no?

Code Morphing can translate an entire group of x86 instructions at
once, creating a translation, whereas a superscalar x86 translates single
instructions in isolation. Moreover, while a traditional x86 translates each
x86 instruction every time it is executed, Transmeta s software translates
instructions once, saving the resulting translation in a translation cache.
The next time the (now translated) x86 code is executed, the system skips the
translation step and directly executes the existing optimized translation.

Substitute "fragment" for
"translation," and you've got a reasonable
description of Dynamo.