November 30, 2014

I made a very explicit, pragmatic design decision with afl-fuzz: for performance and reliability reasons, I did not want to get into static analysis or symbolic execution to understand what the program is actually doing with the data we are feeding to it. The basic algorithm for the fuzzer can be just summed up as randomly mutating the input files, and gently nudging the process toward new state transitions discovered in the targeted binary. That discovery part is done with the help of lightweight and extremely simple instrumentation injected by the compiler.

I had a working theory that this would make the fuzzer a bit smarter than a potato, but I wasn't expecting any fireworks. So, when the algorithm managed to not only find some useful real-world bugs, but to successfully synthesize a JPEG file out of nothing, I was genuinely surprised by the outcome.

Of course, while it was an interesting result, it wasn't an impossible one. In the end, the fuzzer simply managed to wiggle its way through a long and winding sequence of conditionals that operated on individual bytes, making them well-suited for the guided brute-force approach. What seemed perfectly clear, though, is that the algorithm wouldn't be able to get past "atomic", large-search-space checks such as:

This constraint made the tool less useful for properly exploring extremely verbose, human-readable formats such as HTML or JavaScript.

Some doubts started to set in when afl-fuzz effortlessly pulled out four-byte magic values and synthesized ELF files when testing programs such as objdump or file. As I later found out, this particular example is often used as a benchmark for complex static analysis or symbolic execution frameworks.
But still, guessing four bytes could have been just a happy accident. With fast targets, the fuzzer can pull off billions of execs per day on a single machine, so it could have been dumb luck.

(As an aside: to deal with strings, I had this very speculative idea of special-casing memory comparison functions such as strcmp() and memcmp() by replacing them with non-optimized versions that can be instrumented easily. I have one simple demo of that principle bundled with the fuzzer in experimental/instrumented_cmp/, but I never got around to actually implementing it in the fuzzer itself.)

Anyway, nothing quite prepared me for what the recent versions were capable of doing with libxml2. I seeded the session with:

...and simply used that as the input for a vanilla copy of xmllint. I was merely hoping to stress-test the very basic aspects of the parser, without getting into any higher-order features of the language. Yet, after two days on a single machine, I found this buried in test case #4641 in the output directory:

...<![<CDATA[C%Ada b="c":]]]>...

What the heck?!

As most of you probably know, CDATA is a special, differently parsed section within XML, separated from everything else by fairly complex syntax - a nine-character sequence of bytes that can't be realistically discovered by just randomly flipping bits.

The finding is actually not magic; there are two possible explanations:

As a recent "well, it's cheap, so let's see what happens" optimization, AFL automatically sets -O3 -funroll-loops when calling the compiler for instrumented binaries, and some of the shorter fixed-string comparisons will be actually just expanded inline. For example, if the stars align just right, strcmp(buf, "foo") may be unrolled to:

...which, by the virtue of having a series of explicit and distinct branch points, can be readily instrumented on a per-character basis by afl-fuzz.

If that fails, it just so happens that some of the string comparisons in libxml2 in parser.c are done using a bunch of macros that will compile to similarly-structured code (as spotted by Ben Hawkes). This is presumably done so that the compiler can optimize this into a tree-style parser - whereas a linear sequence of strcmp() calls would lead to repeated and unnecessary comparisons of the already-examined chars.

(Although done by hand in this particular case, the pattern is fairly common for automatically generated parsers of all sorts.)

The progression of test cases seems to support both of these possibilities:

<![
<![C b="c">
<![CDb m="c">
<![CDAĹĹ@
<![CDAT<!
...

I find this result a bit spooky because it's an example of the fuzzer defiantly and secretly working around one of its intentional and explicit design limitations - and definitely not something I was aiming for =)

Of course, treat this first and foremost as a novelty; there are many other circumstances where similar types of highly verbose text-based syntax would not be discoverable to afl-fuzz - or where, even if the syntax could be discovered through some special-cased shims, it would be a waste of CPU time to do it with afl-fuzz, rather than a simple syntax-aware, template-based tool.

(Coming up with an API to make template-based generators pluggable into AFL may be a good plan.)

By the way, here are some other gems from the randomly generated test cases:

One stupid heuristic would be to run strings on the binary and the shared libraries it uses. That turns up "<![CDATA[" and a bunch of other tags on libxml2: http://pastebin.com/JniN4bwM It would massively increase the search space, so you might have to prune it to exclude strings with whitespace. Most of the other strings seem to be various kinds of error messages.

I've been thinking about that! The main problem is with dynamically linked libraries. If somebody tries to fuzz a dynamically linked program, we won't see the relevant strings from the parser library at all, and if we run additional check on every loaded .so, we'll end up with a lot of irrelevant stuff. But hopefully, there's some decent middle ground.

It would be interesting (and simple enough) to use AFL with a modified compiler that always produces such code for strcmp and similar built-ins, regardless of the length of the string, as long as it is known during compilation.