Saturday, March 19, 2011

The first serious projects I coded in Haskell were compilers, code analyzers, and the like. I think it's a domain that really plays to the strengths of the language. So it makes sense that we'd want a top-notch disassembler library for Haskell.

udis86 is a fast, complete, flexible disassembler for x86 and x86-64 / AMD64. It provides a clean C API, which was no trouble to import using hsc2hs. But any C API is going to feel clunky next to high-level idiomatic Haskell code. So I built two additional layers of wrapping, to make something that will feel like a natural part of your next Haskell app.

You can download my library hdis86 from Hackage, or browse the source at GitHub. By default, a copy of udis86 is embedded in hdis86. If you already have udis86 installed as a shared library, you can link against that instead.

Getting started

Let's try it out. We'll feed in a ByteString of machine code, and pretty-print the result with groom.

If you're not familiar with x86, you might be surprised by the range of simple and complicated instructions. The first instruction is a humble int3, which executes a trap to a debugger. It takes no operands and has no prefixes.

The second instruction is an increment of a 32-bit memory region, whose address is computed by summing the value in register RSI, the value in register RBX times a scale factor of 4, and a fixed offset of 15. This instruction also has a lock prefix, which changes its concurrency semantics. Some prefixes have a direct meaning like this; others (like Rex) will change how the disassembler decodes operands.

In reality, this analysis is grossly incomplete. Some instructions have implicit register operands, and some pairs of registers overlap, like EBX and RBX. And we have to understand contextual requirements. It's not okay to remap RAX to RBX if some calling code is expecting a return value in RAX.

The complexity of x86 (not to mention the halting problem) means that binary code analysis will never be easy. Hopefully hdis86 can be one of many useful tools in this domain. As always, suggestions or patches are welcome.