Unfolding obfuscated code (part 1)

This article is the first one in a series of two. We present an overview of some reverse engineering capabilities of REVEN-Axion, applied to a publicly available challenge, namely F4b_XOR_W4kfu, the most valued at Grehack 2015’s CTF contest (500 points). A more detailed write-up will be published soon for those interested.

A short reminder of what REVEN does

REVEN is a system-level, symbolic, execution engine, enriched with code analysis plugins interacting with the core through a Python or a C/C++ API. The essential differentiator of REVEN is the ability to perform execution for all execution threads in the system at all x86 privilege levels, whether Ring 0 (Kernel level) or Ring 3 (User level).

A reverse engineering project with REVEN starts with the recording of a scenario, aka a run of the target system. In our case, the scenario executes F4b_XOR_W4kfu with some password input and terminates when the binary exits.

At this stage, REVEN is able to compute all the instructions executed system wide during the scenario, instead of running them on real hardware. Consequently, REVEN helps countering anti-debugging/instrumenting techniques.

Where necessary, the power of REVEN for dynamic analysis can be combined with static analysis tools via synchronization.

In this article, we show how REVEN considerably helps in analyzing strongly obfuscated binaries. Some other examples of use cases are:

Analysis and debug of complex interactions:

Hardware/software.

User space/kernel space.

Low-level application/application.

Random crashes analysis.

Malware and obfuscated code analysis.

Performance analysis.

Etc.

Getting our hands dirty (though not that much)

F4b_XOR_W4kfu is a 32 bits PE binary, without any fancy GUI, it asks for a password from the standard input and then prints Nop! or Yes!. The mission is to find out the good password (one that makes the program print Yes!).

./F4b_XOR_W4kfu.exe
Welcome!
Password? 1234aqzert
Nop!

We submit the binary to REVEN, record a scenario and produce the execution trace.

REVEN unfolds the whole actual execution of F4b_XOR_W4kfu by the CPU along with registers and memory states and provides powerful tools to understand how this code works. Although this operation is CPU intensive and still quite long, REVEN can work alone without human interaction.

REVEN-Axion in action

Launching REVEN-Axion, we note that the trace for the challenge is extremely big with 2.716.465.511 instructions executed until the first password checking procedure! As we’ll see later, this is the consequence of a nested multiprocess virtual machine.

The forward dynamic tainting analysis by REVEN shows that the instructions handling the input are not local but distributed all over the trace.

The tool helps noticing the trace exhibits lots of execution after write (i.e. some memory addresses are overwritten before getting executed). In fact, most instructions in the binary are encrypted: they are decrypted just before execution and immediately encrypted right after execution, so the binary cannot be unpacked in the classical sense.

Indeed, the backward dynamic tainting analysis offered by REVEN shows a chain of read/write/execute for a single address.

To quickly get an intuition about what is going on, we extract a partial flow control graph from the trace produced by REVEN. The graph below is constructed after a trace of 10.000.000 instructions starting at address 0x402048.

The form of the graph suggests that we may be in front of a virtual machine (VM) with a switch-based dispatcher. The typical form of such a VM consists of a dispatcher spread over several basic blocks; and there exist many opcode handlers, each located in a “non trivial” basic block to which control flow is transferred from a much smaller number of “dispatch points”.

Reversing the virtual machine

Further analysis shows that the dispatcher can be divided into two phases with transition instructions in between. The figure below shows the connection between two opcode handlers x and y.

The first phase is analyzed as follows, working on the trace with REVEN:

Recompute the boundaries of the memory chunk containing the last executed opcode handler (i.e. x).

Encrypt this chunk (i.e. xor it with the string IsThisTheFlag?).

Retrieve an entry from a dispatch table.

Calculate the address of the transition code based on this entry.

Jump to the transition code.

Recording several scenarios with different inputs with REVEN, we observe that the code modification happens in both phases. The transition code sometimes generates effects on the opcode handlers.

The second phase is analyzed as follows in summary:

Calculate the address of the next handler to execute (i.e. y).

Compute the boundaries of the memory chunk containing y.

Decrypt this chunk (i.e. xor it with the string IsThisTheFlag?).

Forward the control flow to y.

We are now able to decrypt all the opcode handlers from the binary and their control flow. Which leads to the reconstruction of an equivalent program consisting only of opcodes, the VM’s runtime effect being abstracted. Constructing the control flow graph without the dispatcher is just a procedural work that can be automated.

(this is a vector image, click on it to observe the details)

This control flow graph is less verbose than the original one, but still quite sophisticated. In the next article, we will go through the reverse engineering of this graph.