Using fcd to solve the CSAW Wyvern challenges

Feb 21, 2016

One of fcd’s touted abilities is to simplify code before presenting its pseudo-C
output. This can be extremely helpful to reverse engineer obfuscated programs. A
good showcase would be the CSAW 2015 Wyvern challenges.

The qualification round featured a first instalment, and the finals saw a
second episode. Both challenges were rated at 500 points, the highest score for
an individual challenge at the CSAW CTF. This blog post focuses on the first
wyvern challenge.

Of course, running strings over wyvern won’t have any useful output. The
next logical step is to hand it to a disassembler, like objdump. However, the
result is an absolute catastrophe. Every symbol worth inspecting (and most that
aren’t even worth it too) are overrun with code branching off at very random
points:

The condition resolves to something like ((x * (x - 1)) & 0x1) == 0 || y < 10.
Folks lucky enough to have Hex-Rays can see that the program has been completely
drowned in them, to the point where there is rarely (if ever) more than one
useful instruction between each block of 14 garbage instructions. This obscures
the control flow a great deal.

During the qualifications, Ryan Stortz said through his
withzombies
Twitter account that the
first instalment would be best dealt using dynamic analysis. Most teams (mine
not included) used Intel’s PIN to instrument the code and figure out the
input that makes the program run for the longest time. I’m not sure what people
did for the second round; since fcd couldn’t deal with programs that big this
last November, I ended up erasing the nasty conditions by hand until the code
made sense. The exercise was as tedious as it sounds, and the sleep deprivation
certainly didn’t help.

Fast-forward about three months later. Fcd is still not the best decompiler
available, but it certainly can handle this odd job.

Fighting dragons with dragons

A major strength of fcd, especially when compared to other decompilers, is that
users can write optimization passes in Python to supplement the standard LLVM
passes to simplify program code. LLVM is excellent at finding code with a
constant outcome, deleting useless bits and simplifying the rest, so we can
focus on creating a tiny pass that will tip the garbage code just over the edge
of unpredictability into predictability, and let the framework do all the
heavy lifting.

In this case, for both instalments, the variables used in the conditions are
loaded from the data segment, but they are never modified. This means that they
are always equal to zero. We can share this knowledge with fcd using a tiny
optimization pass written in Python.

If you’re actually doing the challenge, you first need to identify interesting
functions. This is more an fcd demo than a Wyvern write-up, so you’ll have to
bear with me. After some manual inspection or dynamic analysis, you should
figure out that the interesting functions are sanitize_input (0x401cc0) and
transform_input (0x4014b0), which transform and test the input line in
mysterious ways.

For this example, we’ll do two invocations of fcd: one to save a LLVM assembly
file corresponding to the executable (because generating this file takes a long
time), and one to run fcd’s optimization passes (and our custom pass) over it.
First generating an assembly file will save us lots of time if we screw up our
custom pass.

$ fcd -p -n -e 0x4014b0 -e 0x401cc0 wyvern > wyvern.ll

The manual goes into greater depth about these options, but the gist is the
following:

-p (“partial”) tells fcd that we’re interested in a few functions only;

-e (“entry”) specifies the virtual addresses of the functions that we’re
interested with;

-n says that we want a LLVM assembly file for output.

This takes some time (about 20 seconds on my computer). It’s actually nothing
compared to wyvern2, which takes about 7 minutes (though it still seems to
work). To my poor 2011 MacBook Pro’s defence, I compile fcd against a debug LLVM
build, which can be up to 10 times slower than a release build.

The next step will be to write a Python script to act as an optimization pass.
We can get away with just a little under 40 lines.

Looking at the assembly code above, we see that the obfuscated code loads two
values (that are known to be zero), manipulates them a little bit and branches
off. LLVM is powerful enough that if we can only tell it that the loaded values
are always zero, it should be able to get rid of everything else. This is what
we’ll do here.

Running nm wyvern2 | sort, we can see that all the x and y variables all
live between the addresses 0x610318 and 0x6105ac (inclusively). The x86
instructions that load from them are mov instructions, which eventually
translate to LLVM load instructions. We will build our pass to replace
load LLVM instructions to these addresses with a constant zero value.

Optimization scripts need a passName variable, and either a runOnFunction or
a runOnModule global function. This should come at no surprise to people
remotely familiar with LLVM’s pass infrastructure. In our case, we only ever
need to access individual functions, so that’s what we’ll use.

The runOnFunction function is run when the wrapper LLVM pass is executed. It
must return True if the pass modified the function. It simply takes every
basic block in the function and passes it to _runOnBasicBlock.
_runOnBasicBlock, on its end, iterates over every instruction in the basic
block and passes them to _runOnInst.

The _runOnInst function checks if the current instruction is a load
instruction that loads from the address of a x or y variable. If so, it
replaces every use of the loaded value with the constant zero value. And just
with that, the other optimization passes will correctly detect that the
conditions have a constant outcome, and constant propagation will eat them away.

We can use our Python pass (saved as a wyvern.py file) with this invocation of
fcd:

$ fcd -m --cc=x86_64/sysv -O wyvern.py wyvern.ll

This time around, the options mean:

-m: the input file is a LLVM module;

--cc=x86_64/sysv: since the input is a LLVM module, fcd can’t gather enough
metadata to guess the calling convention, so we need to specify that the
program uses the System V x86_64 calling convention;

-O wyvern.py tells fcd to load the wyvern.py file as an optimization pass.

What hurts the most in the output is that fcd’s alias analysis (which is mostly
LLVM’s alias analysis) isn’t smart enough to figure out that a lot of these
functions don’t modify any memory. If we could help fcd understand that
vector::size() and vector::operator[] don’t modify memory, LLVM’s scalar
replacement of aggregates pass should be able, among other things, to promote
transform_input’s stackframe variable to SSA values. This would eliminate
the pointer indirections and make the function a whole lot more readable.
Unfortunately, there’s currently no way to do that, so we’re stuck with the
pointer-happy versions of the functions.

Still, transform_input isn’t particularly hard to analyze. There’s a loop that
tests if what’s in field0 is smaller than the size of the vector pointed to by
phi10 (rdi); else, the loop breaks. If field0 is indeed smaller than the
vector’s size, we use operator[] with field0 as the index to get a pointer
to the element in the vector (as this is how references translate), and then we
load that value and accumulate it in field9. We increment the loop counter by
hand.

In other words, transform_input might originally have looked like this:

Maybe fcd will one day output something that good! Until then, a little bit of
analysis is still required.

sanitize_input is unfortunately much more cluttered. The stack frame is much
larger and stack recovery didn’t figure out every stack pointer access, and now
we’re stuck with strange pointer arithmetic based off stackframe fields. There
are several dead stores, and there are rampant double indirections that we
manually have to take care of. Several of these issues could be dealt with if
transform_input and sanitize_input were on a different optimization pipeline
(or if we just ran fcd twice with different optimizations), but the pass
pipeline is not very configurable without doing some source editing, so we will
have to deal with this.

The loop body looks quite bad and the simplest way to analyze it is probably to
look at the calls it makes and then thread them together.

Next, it calls push_back with field54 (aka anon8) as the vector
(constructed earlier), and the result of operator[] as the parameter (through
anon75), which is the value loaded from the pointer returned by the previous
call. In other words, it took arg0[i] and appended it to the vector.

This uses operator[] on the global vector at address 0x6102f8. Its symbol name
is hero. It is built earlier and it is the concatenation of all the secret*
global variables in the program.

The index parameter is quite the aliasing maze. anon102 is an index loaded by
dereferencing field53. The pointer is set to point to field6. It is not
used directly, but it is then set to zero through *anon36 at the beginning of
the function. It’s then referred to as phi38 and it is incremented after each
successful loop iteration. In other words, it’s also a (or the?) loop counter,
and each iteration of the loop gets the next number from the list of “secret”
values.

This copy-constructs a vector to be passed to transform_input. anon112 is
the new vector’s address and field54 (the vector made out of the string
argument) as the vector being copied. It then calls transform_input on that
vector (summing all of its elements) and destroys the vector copy. It tests
whether field27 (which contains the “hero” value for this iteration) is the
same as the vector’s sum and saves that result in field23.

At this point, things get pretty messy and and walking back aliasing values is
no longer very useful. However, we’ve got a crucial piece of information: the
program tests that the sum of every character so far in the loop is equal to the
values of the hero vector.

The dragon’s secret

We’ve already talked about the secret variables in the program. Running
nm wyvern | grep secret | sort, we can have an ordered list of them.
Conveniently enough, the content of the secret variables is show directly in
their name, so secret_1222 contains 1222.

The hypothesis is that at index N, the sum of the ASCII codes of the input
string should be equal to the corresponding secret variable. We can easily
write a tiny Python script to see what this would look like.

Tending to your wounds

At this point, I feel that fcd did an excellent job at cleaning up spurious
branches with truly minimal effort. However, because of alias analysis problems,
it fell short of providing effortlessly analyzable output. Unfortunately, alias
analysis is known to be at least undecidable since the early nineties, so it’s
unlikely that fcd will ever solve the General Problem.

That said, fcd is still a work in progress and development started less than a
year ago, with a team of one. Coming this close to beating the wyvern to a pulp
is very exciting and somewhat of a milestone.

So with this, I’ll retreat from the battlefield, train some more, and maybe try
the fight again once I feel that relevant progress was made. Keep an eye open
for it!