There are two types of virtual machine software protections: A) the ones that convert x86 machine code into virtual machine bytecode and execute it at runtime; B) the ones that execute some arbitrary code in a virtual environment. I've discussed the latter severaltimes in the past, and by now there exists a wealth of literature on that variety. But breaking the former kind remains an unsolved problem.

In my article I said "basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it's repetitious, and the high-level details are obscured by the flood of low-level details". The more I thought about this, the more I realized that the word "basically" is out of place: virtualizing software protections are programming language interpreters, albeit for weird languages.

Consequently, an idea struck me: what we want here is not an interpreter, but a compiler to compile the bytecode back into x86 machine code. I spent a week coding one (~1000 lines) in OCaml to test this theory, and I'm able to report that, indeed, it works. I chose ReWolf's x86 Virtualizer, a simple target that uses some of the same techniques as the heavy hitters in this area. Here is a walkthrough of the analysis and recompilation of a small function with one basic block. The compiler works equally well for arbitrarily-large functions, although that would make this posting unnecessarily long and complicated.

Step -2: Protect something with the virtualizer. In this case I just used ReWolf's sample executable itself.

Step -1: Analyze the virtual machine. Although this was not strictly necessary in this case because ReWolf provided source code, I decided to ignore it and reverse the VM manually, since you don't always have such niceties.

Step 0: Break the polymorphism in the instruction set. I made use of two remarkably ghetto hacks here, one of which may be considered elegant. To avoid provoking any arms races I'll omit the details.

Step 1: Disassemble the relevant region into VM bytecode. In the process, construct a graph in which each vertex is an instruction, and the edges are the flows between them.

Step 3: Optimize the code within the basic block. The goal is to convert sequences of VM instructions into a new language more conducive to being compiled back into X86. The optimizer is the most powerful component of my compiler: it can remove obfuscation automatically simply as a side-effect of being an optimizer (not that ReWolf's has any, but others do), and employs no pattern matching.

Step 5: Stuff the original bytes back into the binary and perform fixups specified. If you can convert between hex and decimal in your head, you'll notice that the bytes above correspond to those below, modulo fixups. For multi-basic-block functions, this is harder, as you have to sequence the blocks and decide between short and long jumps.

Step 6: Celebrate. ReWolf's X86 Virtualizer was simple, and surely breaking the harder ones is, well, harder, but I believe that the general principles espoused here should be applicable to the others.

I'll have more to say about this in the future, including source code.

Reminds me a bit of Virus Bulletin 2003's paper by Frédéric Perriot, "Defeating Polymorphism Through Code Optimization" on using optimization techniques to try to get a canonical representation of some obfuscared code. He did not apply it to VMs tho.

Really good work Rolf, btw, how did you like OCaml? I've only played briefly with it.

Ero: I'd never seen this paper before, but it makes sense. Certain obfuscations are "de-optimizations", e.g. worthless instructions inserted into the stream = definitions with no uses, can be eliminated by dead code elimination; e.g. using several instructions to produce a constant value = constant unfolding, can be eliminated by constant folding/propagation, etc. Of course this doesn't apply to all forms of obfuscation.

Regarding OCaml, the Windows OCaml interpreter needs work. For one thing, it keeps leaving these zombie processes around; I had to kill literally two dozen associated processes earlier today. For another, if you paste in, say, 25kb of code at one time, the interpreter will freeze. So I have to copy and paste my VM compiler into the interpreter bit by bit which is annoying. I suppose I should start working with the actual compiler, not the interpreter.

Regarding the language itself, I'm still too green (and not enough of a language maven) to say anything really substantial about it, so I'll let people smarter than me do the talking; see here, and here's a quote from Benjamin C. Pierce from "Types and Programming Languages" on which languages are good choices for static analysis:

"The most important requirements are automatic storage management (garbage collection) and easy facilities for defining recursive functions by pattern matching over structured data types. Functional languages ... with pattern-matching ... are fine choices. Languages with garbage collection but without pattern matching, such as Java ... are somewhat heavy for the sorts of programming we'll be doing. Languages with neither, such as C, are even less suitable."

the biggist problem i see ..after re-reading a bit :) is being to setup rules for the VM-script .. i.e say the Opcode size is hardcoded into the Block which executes it .. i.e like in A certain protection .. where you dont know the size of the opcode before at the end of its execution ..so youd still have todo some tracing or byte scanning...atleast to be able to automate the whole thing , or am i missing something rolf :)