How I nearly almost saved the Internet, starring afl-fuzz and dnsmasq

2015年11月18日 18:21:55

阅读数：472

If you know me, you know that
I love DNS. I'm not exactly sure how that happened, but I suspect that
Ed Skoudis is at least partly to blame.

Anyway, a project came up to evaluate dnsmasq, and being a DNS server - and a key piece of Internet infrastructure - I thought it would be fun! And it was! By fuzzing in a somewhat creative way, I found a really cool vulnerability that's almost certainly
exploitable (though I haven't proven that for reasons that'll become apparent later).

Although I started writing an exploit, I didn't finish it. I think it's almost certainly exploitable, so if you have some free time and you want to learn about exploit development, it's worthwhile having a look!
Here's a link to the actual distribution of a vulnerable version, and I'll discuss the work I've done so far at the end of this post.

You can also download
my branch, which is similar to the vulnerable version (branched from it), the only difference is that it contains a bunch of fuzzing instrumentation and debug output around parsing names.

dnsmasq

For those of you who don't know,
dnsmasq is a service that you can run that handles a number of different protocols designed to configure your network: DNS, DHCP, DHCP6, TFTP, and more. We'll focus on DNS - I fuzzed the other interfaces and didn't find anything, though when it comes to
fuzzing, absence of evidence isn't the same as evidence of absence.

It's primarily developed by a single author, Simon Kelley. It's had a reasonably clean history in terms of vulnerabilities, which may be a good thing (it's coded well) or a bad thing (nobody's looking) :)

At any rate, the author's response was impressive. I made a little timeline:

May 12, 2015: Discovered

May 14, 2015: Reported to project

May 14, 20252015: Project responded with a patch candidate

May 15, 2015: Patch committed

The fix was actually pushed out faster than I reported it! (I didn't report for a couple days because I was trying to determine how exploitable / scary it actually is - it turns out that yes, it's exploitable, but no, it's not scary - we'll get to why at
the end).

DNS - the important bits

The vulnerability is in the DNS name-parsing code, so it makes sense to spend a little time making sure you're familiar with DNS. If you're already familiar with how DNS packets and names are encoded, you can skip this section.

Note that I'm only going to cover the parts of DNS that matter to this particular vulnerability, which means I'm going to leave out a bunch of stuff. Check out the RFCs (rfc1035, among others) or Wikipedia for complete details. As a general rule, I encourage
everybody to learn enough to manually make requests to DNS servers, because that's an important skill to have - plus, it's only like 16 bytes to remember. :)

DNS, at its core, is actually rather simple. A client wants to look up a hostname, so it sends a DNS packet containing a question to a DNS server (on UDP port 53, normally, but TCP can be used as well). Some magic happens, involving caches and recursion,
then the server replies with a DNS message containing the original question, and zero or more answers.

The last four fields - questions, answers, authorities, and additionals - are collectively called "resource records". Resource records of different types have different properties, but we aren't going to worry about that. The general structure of a question
record is:

(variable) name (the important part!)

(int16) type (A/AAAA/CNAME/etc.)

(int16) class (basically always 0x0001, for Internet addresses)

DNS names

Questions and answers typically contain a domain name. A domain name, as we typically see it, looks like:

this.is.a.name.skullseclabs.org

But in a resource records, there aren't actually any periods, instead, each field is preceded by its length, with a null terminator (or a zero-length field) at the end:

\x04this\x02is\x01a\x04name\x0cskullseclabs\x03org\x00

The maximum length of a field is 63 - 0x3f - bytes. If a field starts with 0x40, 0x80, 0xc0, and possibly others, it has a special meaning (we'll get to that shortly).

Questions and answers

When you send a question to a DNS server, the packet looks something like:

(header)

question count = 1

question 1: ANY record for skullsecurity.org?

and the response looks like:

(header)

question count = 1

answer count = 11

question 1: ANY record for "skullsecurity.org"?

answer 1: "skullsecurity.org" has a TXT record of "oh hai NSA"

answer 2: "skullsecurity.org" has a MX record for "ASPMX.L.GOOGLE.com".

answer 3: "skullsecurity.org" has a A record for "206.220.196.59"

...

(yes, those are some of my real records :) )

If you do the math, you'll see that "skullsecurity.org" takes up 18 bytes, and would be included in the response packet 12 times, counting the question, which means we're effectively wasting 18 * 11 or close to 200 bytes. In the old days, 200 bytes were
a lot. Heck, in the new days, 200 bytes are still a lot when you're dealing with millions of requests.

Record pointers

Remember how I said that name fields starting with numbers above 63 - 0x3f - are special? Well, the one we're going to pay attention to is 0xc0.

0xc0 effectively means, "the next byte is a pointer, starting from the first byte of the packet, to where you can find the rest of the name".

So typically, you'll see:

12-bytes header (trn_id + flags + counts)

question 1: ANY record for "skullsecurity.org"

answer 1: \xc0\x0c has a TXT record of "oh hai NSA"

answer 2: \xc0\x0c ...

"\xc0" indicates a pointer is coming, and "\x0c" says "look 0x0c (12) bytes from the start of the packet", which is immediately after the header. You can also use it as part of a domain name, so your answer could be "\x03www\xc0\x0c", which would become
"www.skullsecurity.org" (assuming that string was 12 bytes from the start).

This is only mildly relevant, but a common problem that DNS parsers (both clients and servers) have to deal with is the infinite loop attack. Basically, the following packet structure:

12-byte header

question 1: ANY record for "\xc0\x0c"

Because question 1 is self-referential, it reads itself over and over and the name never finishes parsing. dnsmasq solves this by limiting reference to 256 hops - that decision prevents a denial-of-service attack, but it's also what makes this vulnerability
likely exploitable. :)

Setting up the fuzz

All right, by now we're DNS experts, right? Good, because we're going to be building a DNS packet by hand right away!

Before we get to the actual vulnerability, I want to talk about how I set up the fuzzing. Being a networked application, it makes sense to use a network fuzzer; however, I really wanted to try out
afl-fuzz from
lcamtuf, which is a file-format fuzzer.

afl-fuzz works as an intelligent file-format fuzzer that will instrument the executable (either by specially compiling it or using binary analysis) to determine whether or not it's hitting "new" code on each execution. It optimizes each cycle to take advantage
of all the new code paths it's found. It's really quite cool!

Unfortunately, DNS doesn't use files, it uses packets. But because the client and server each process only one single packet at a time, I decided to modify dnsmasq to read a packet from a file, parse it (either as a request or a response), then exit. That
made it possible to fuzz with afl-fuzz.

Unfortunately, that was actually pretty non-trivial. The parsing code and networking code were all mixed together. I ended up re-implementing "recv_msg()" and "recv_from()", among other things, and replacing their calls to those functions. That could also
be done with a LD_PRELOAD hook, but because I had source that wasn't necessary. If you want to see the changes I made to make it possible to fuzz, you can
search the codebase for "#ifdef FUZZ" - I made the fuzzing stuff entirely optional.

If you want to follow along, you should be able to reproduce the crash with the following commands (I'm on 64-bit Linux, but I don't see why it wouldn't work elsewhere):

Warning: DNS is recursive, and in my fuzzing modifications I didn't disable the recursive requests. That means that dnsmasq
will forward some of your traffic to upstream DNS servers, and that traffic
could impact those severs (and I actually proved that, by accident; but we won't get into that :) ).

Doing the actual fuzzing

Once you've set up the program to be fuzzable, fuzzing it is actually really easy.

First, you need a DNS request and response - that way, we can fuzz both sides (though ultimately, we don't need to for this particular vulnerability, since both the request and response parse names).

If you've wasted your life like I have, you can just write the request by hand and send it to a server, then capture the response:

Notice how it starts with "\x12\x34" (the same transaction id I sent), has a question count of 1, has an answer count of 0x0b (11), and contains "\x06google\x03com\x00" 12 bytes in (that's the question). That's basically what we discussed earlier. But the
important part is, it has "\xc0\x0c" throughout. In fact, every answer starts with "\xc0\x0c", because every answer is to the first and only question.

That's exactly what I was talking about earlier - each of those 11 instances of "\xc0\x0c" saved about 10 bytes, so the packet is 110 bytes shorter than it would otherwise have been.

Now that we have a base case for both the client and the server, we can compile the binary with afl-fuzz's instrumentation. Obviously, this command assumes that afl-fuzz is stored in "~/tools/afl-1.77b" - change as necessary. If you're trying to compile
the original code, it doesn't accept CC= or CFLAGS= on the commandline unless you apply
this patch first.

...which, in spite of being seeded with "hello" instead of an actual DNS packet, actually found an order of magnitude more crashes than the proper packets, except with much, much uglier proofs of concept.. :)

Fuzz results

I let this run overnight, specifically to re-create the crashes for this blog. In the morning (after roughly 20 hours of fuzzing), the results were:

Triage

Although we have over a hundred crashes, I know from experience that they're all caused by the same core problem. But not knowing that, I need to pick something to triage! The difference between starting from a well formed request and starting from a "hello"
string is noticeable... to take the smallest PoC from "hello", we have:

The first three crashes are interesting, because they're very similar. The only differences are the flags field (0x0100 or 0x0800) and the count fields (the first is unmodified, the second has 0xe100 "authority" records listed, and the third has 0xeb00 "question"
records). Presumably, that stuff doesn't matter, since random-looking values work.

Also note that near the end of every message, we see our old friend again: "\xc0\x0c".

We can run afl-tmin on the first one to get the tightest message we can:

As predicted, the question and answer counts don't matter. All that matters is the name's length fields and the "\xc0\x0c". Oddly it included the "o" from google.com, which is probably a bug (my fuzzing instrumentation isn't perfect because due to requests
going to the Internet, the result isn't always deterministic).

Honestly, when I got here, I lost steam. Null-pointer dereferences need to be fixed, especially because they can hide other bugs, but they aren't going to earn me l33t status. So I would have to fix it or deal with hundreds of crappy results.

If we look at the packet in more detail, the name it's parsing is essentially: "\x06AAAAAA\x03AAA\xc0\x0c" (changed '0' to 'A' to make it easier on the eyes). The "\xc0\x0c" construct reference 12 bytes into the message, which is the start of the name. When
it's parsed, after one round, it'll be "\x06AAAAAA\x03AAA\x06AAAAAA\x03AAA\xc0\x0c". But then it reaches the "\xc0\x0c" again, and goes back to the beginning. Basically, it infinite loops in the name parser.

So, it's obvious that a self-referential name causes the problem. But why?

I tracked down the code that handles 0xc0. It's in rfc1035.c, and looks like:

If look at that code, everything looks pretty okay (and for what it's worth, the printf()s are my instrumentation and aren't in the original). If that's not the problem, the only other field type being parsed is the name part (ie, the part without 0x40/0xc0/etc.
in front). Here's the code (with a bunch of stuff removed and the indents re-flowed):

This code runs for each segment that starts with a value less than 64 ("google" and "com", for example).

At the start, l is the length of the segment (so 6 in the case of "google"). It adds that to the current TOTAL length -
namelen - then checks if it's too long - this is the check that prevents a buffer overflow.

Then it reads in l bytes, one at a time, and copies them into a buffer -
cp - which happens to be on the heap. the namelen check prevents that from overflowing.

Then it copies a period into the buffer and doesn't increment namelen.

Do you see the problem there? It adds l to the total length of the buffer, then it reads in
l + 1 bytes, counting the period. Oops?

It turns out, you can mess around with the length and size of substrings quite a bit to get a lot of control over what's written where, but exploiting it is as simple as doing a lookup for "\x08AAAAAAAA\xc0\x0c":

However, there are two termination conditions: it'll only loop a grand total of 255 times, and it stops after
namelen reaches 1024 (non-period) bytes. So coming up with the best possible balance to overwrite what you want is actually pretty tricky - possibly even requires a bit of calculus (or, if you're an engineer,
a program that can optimize it for you :) ).

I should also mention: the reason the "\xc0\x0c" is needed in the first place is that it's impossible to have a name string in that's 1024 bytes - somewhere along the line, it runs afoul of a length check. The "\xc0\x0c" method lets us repeat stuff over
and over, sort of like decompressing a small string into memory, overflowing the buffer.

Woah.. that's not a null-pointer dereference! That's a write-NUL-byte-to-arbitrary-memory! Those might be exploitable!

As I mentioned earlier, this is actually a heap overflow. The interesting part is, the heap memory is allocated once - immediately after the program starts - and right after, a heap for the global settings object (daemon) is allocated. That means
that we have effectively full control of this object, at least the first couple hundred bytes:

I haven't measured how far into that structure you can write, but the total number of bytes we can write into the 1024-byte buffer is 1368 bytes, so somewhere in the realm of the first 300 bytes are at risk.

The reason we saw a "null pointer dereference" and also a "write NUL byte to arbitrary memory" are both because we overwrote variables from that structure that are used later.

Patch

The patch is pretty straight forward: add 1 to namelen for the periods. There was a second version of the same vulnerability (forgotten period) in the 0x40 handler as well.

But..... I'm concerned about the whole idea of building a string and tracking the length next to it. That's a dangerous design pattern, and the chances of regressing when modifying any of the name parsing is high.

Exploit so-far

I started writing an exploit for it. Before I stopped, I basically found a way to brute-force build a string that would overwrite an arbitrary number of bytes by adding the right amount of padding and the right number of periods. That turned out to be a
fairly difficult job, because there are various things you have to juggle (the padding at the front of the string and the size of the repeated field). It turns out, the maximum length you can get is 1368 bytes put into a 1024-byte buffer.

...why it never got famous

I held this back throughout the blog because it's the sad part. :)

It turns out, since I was working from the git HEAD version, it was brand new code. After bissecting versions to figure out where the vulnerable code came from, I determined that it was present only in 2.73rc5 - 2.73rc7. After I reported it, the
author rolled out 2.73rc8 with the fix.

It was disappointing, to say the least, but on the plus side the process was interesting enough to write about! :)

Conclusion

So to summarize everything...

I modified dnsmasq to read packets from a file instead of the network, then used afl-fuzz to fuzz and crash it.

I found a vulnerability that was recently introduced, when parsing "\xc0\x0c" names + using periods.

I triaged the vulnerability, and started writing an exploit.

Determined that the vulnerability was in brand new code, so I gave up on the exploit and decided to write a blog instead.

And who knows, maybe somebody will develop one for fun? If anybody does, I'll give them
a month of Reddit Gold!!!! :)

(I'm kidding about using that as a motivator, but I'll really do it if anybody bothers :P)