January 09, 2015

One of the most significant limitations of afl-fuzz is that its mutation engine is syntax-blind and optimized for compact data formats, such as binary files (e.g., archives, multimedia) or terse human-readable languages (RTF, shell scripts). Any general-purpose fuzzer will have a harder time dealing with more verbose dialects, such as SQL or HTTP. You can improve your odds in a variety of ways, and the results can be surprisingly good - but ultimately, it's never easy to get from Set-Cookie: FOO=BAR to Content-Length: -1 by randomly flipping bits.

The common wisdom is that if you want to fuzz data formats with such ornate grammars, you need to build an one-off, protocol-specific mutation engine with the appropriate syntax templates baked in. Of course, writing such code isn't easy. In essence, you need to manually build a model precise enough so that the generated test cases almost always make sense to the targeted parser - but creative enough to trigger unintended behaviors in that codebase. It takes considerable experience and a fair amount of time to get it just right.

I was thinking about using afl-fuzz to reach some middle ground between the two worlds. I quickly realized that if you give the fuzzer a list of basic syntax tokens - say, the set of reserved keywords defined in the spec - the instrumentation-guided nature of the tool means that even if we just mindlessly clobber the tokens together, we will be able to distinguish between combinations that are nonsensical and ones that actually follow the rules of the underlying grammar and therefore trigger new states in the instrumented binary. By discarding that first class of inputs and refining the other, we could progressively construct more complex and meaningful syntax as we go.

Ideas are cheap, but when I implemented this one, it turned out to be a good bet. For example, I tried it against sqlite, with the fuzzer fed a collection of keywords grabbed from the project's docs (-x testcases/_extras/sql/). Equipped with this knowledge, afl-fuzz quickly spewed out a range of valid if unusual statements, such as:

All right, all right: grabbing keywords is much easier than specifying the underlying grammar, but it still takes some work. I've been wondering how to scratch that itch, too - and came up with a fairly simple algorithm that can help those who do not have the time or the inclination to construct a proper dictionary.

To explain the approach, it's useful to rely on the example of a PNG file. The PNG format uses four-byte, human-readable magic values to indicate the beginning of a section, say:

The algorithm in question can identify "IHDR" as a syntax token by piggybacking on top of the deterministic, sequential bit flips that are already being performed by afl-fuzz across the entire file. It works by identifying runs of bytes that satisfy a simple property: that flipping them triggers an execution path that is distinct from the product of flipping stuff in the neighboring regions, yet consistent across the entire sequence of bytes.

This signal strongly implies that touching any of the affected bytes causes the failure of an underlying atomic check, such as header.magic_value == 0xDEADBEEF or strcmp(name, "Set-Cookie"). When such a behavior is detected, the entire blob of data is added to the dictionary, to be randomly recombined with other dictionary tokens later on.

This second trick is not a substitute for a proper, hand-crafted list of keywords; for one, it will only know about the syntax tokens that were present in the input files, or could be synthesized easily. It will also not do much when pitted against optimized, tree-based parsers that do not perform atomic string comparisons. (The fuzzer itself can often clear that last obstacle anyway, but the process will be slow.)

Well, that's it. If you want to try out the new features, click here and let me know how it goes!

6 comments:

The input format for the keywords is really awkward. I was trying out afl* by fuzzing the 'ledger' tool, and I thought I'd give it a hand by using this dictionary feature, but the format seems to be a bunch of files dumped into a directory, each of which's contents is a single keyword...? It's easy for me to generate potential keywords ('find . -name "*.journal" -exec cat {} \; | tr '[[:space:''] '\n' | sort --unique') but not easy to assign each hit to its own file without collisions.

Maybe a simple newline-delimited format would be better?

* pretty easy to use so far, although the errors when you don't specify a heaping helping of memory like '-m 500' are deeply inscrutable, and I wonder why the written-out crashing data isn't automatically minimized with afl-tmin? I was very disappointed when I saw that the hundreds of crashes all seemed to be boiling down to the same minimized test of 344 bytes of 0 which triggered the same buffer overflow issue.

Oh, and for the minimization part: the reason why they aren't minimized is essentially to keep them close to their "source", non-crashing files in the queue, so that it's easier to understand which bitflip or other change causes the crash. It's a tricky trade-off.

It's just an awkward way to input anything. Can you think of any other commandline tool which accepts as configuration input a collection of tokens written as one-token-per-arbitrarily-named-file-in-a-directory? I can't. I wound up not bothering - dumping in a bunch of ledger tests and my own personal files seemed to be enough to teach afl the syntax.

As far as minimizing goes, fair enough: if afl needs to read the original, keep the original. So perhaps instead, afl-fuzz could create a third directory with the minimized version of each? It's some overhead to do this by default, yes, but how many people using afl-fuzz *don't* want the minimized versions?

Also, while minimizing my ledger crash examples and deleting all the duplicates to get a sense of how much I had to file bug reports for, I noticed that running afl-tmin doesn't seem to be a fixed point, as in, I could run it on all of them like 5 times before they stopped shrinking. Is this deliberate?

Newer versions of afl-tmin are recursive, so that last problem should go away =)

For the input data... well, not many programs need to accept a collection of snippets that may be printable text, may be ASCII with newlines or control characters, or may be opaque binary :-) I see your pain, but am not 100% how to solve it: if I had a flat file, people would probably complain loudly about having to escape binary data, which is arguably more painful than splitting lines into files. Let me think about it a bit more. Perhaps just a script to convert one format to another would do (similar to what I pasted above).