Understanding Kaminsky's DNS Bug

Over the past few days details have surfaced about the nature of the DNS bug that Dan Kaminsky announced two weeks ago. Yes, it is as big and as scary as we were told.

As you may remember, Kaminsky coordinated the announcement with many major software vendors and promised not to disclose a way to exploit the bug until August 6 at the annual Black Hat security conference in Las Vegas. This would give ISPs 30 days to patch their systems and protect their users. Kaminsky also asked members of the security community to refrain from publicly speculating about the issue until the details were officially announced. That may have been too much to ask, as it only took 13 days for the issue to find its way into the public eye.

Now that the details are out, and Kaminsky has himself commented on it, let's explore the issue and try to understand exactly how severe it is.

DNS Fundamentals

First, we need to understand the basics of how DNS traffic works. There are three players in the typical scenario:

your computer,

your ISPs recursive DNS servers, and

a website's authoritative DNS servers

Authoritative DNS servers publish the IP addresses of domain names. Recursive DNS servers talk to authoritative servers to find addresses for those domain names. Your computer only talks to recursive DNS servers, which locate the address of the domain name for you. Your computer and the ISP's recursive DNS servers are both types of resolvers, they look elsewhere for answers. Of course, this is a simplified version of the process, but it's enough for our purposes.

So let's look at a typical DNS request and response. In this case we will use example.com and a couple fake DNS servers. (unnecessary dig output has been stripped)

Here we have queried an authoritative DNS server, ns1.example.com, and asked it for the address of www.example.com. As you can see, the response contained the IP address of www.example.com, along with two other sets of records: authority and additional. The authority section simply contains a list of the authoritative DNS servers for the domain in question. The additional section contains IP addresses for those servers. The fact that all of this information comes back in one single response is crucial to the recent exploit.

Bailiwick Checking

Consider the following analogy. Suppose you and I are traveling on a highway and our car breaks down. I ask you for the phone number to the local repair shop, and you respond with "I don't know their phone number, you should call them to find out." Without already knowing the shop's phone number how can I call them to learn their phone number?

Now let's think about a similar situation in DNS. We want to find the address of www.example.com so we ask a root DNS server for the list of .com DNS servers. The root server gives me a list of .com DNS servers so we pick one and ask it for the list of DNS servers for example.com. If the .com server simply responded with ns1.example.com and ns2.example.com we would be stuck. We are trying to find information about example.com and along the way we are being told to ask the example.com DNS servers for answers. To remedy this chicken and egg problem two "A" records are provided in the additional section to provide the missing link. These are called "glue" records.

Glue records are just normal "A" records that are supplied along with the response. The above response could have just as easily included irrelevant records:

Notice the extra line at the end? Here we asked a question about something in the example.com domain, but the sneaky server also responded with information about www.linuxjournal.com. Is this server even authorized to respond to DNS requests for linuxjournal.com? Resolvers use a technique called bailiwick checking to determine whether or not to accept these extra records. This just means that any records that aren't in the same domain of the question are ignored. If we ask for information about ftp.example.com then we only accept information in the additional section that is about example.com.

Since around 1997 almost all modern DNS resolvers use bailiwick checking to protect themselves from this type of cache poisoning attack.

UDP and Query IDs

Most DNS traffic is sent over UDP, which is a connectionless protocol. This means that a resolver (either your computer or recursive DNS server) sends out a request and simply waits to be told an answer, by anyone. Usually there are many DNS requests being made at the same time, so the resolver needs a way to match up the questions it asks with the answers it receives. To do this each request is made with a number between 0 and 65536 called the query ID (QID). The server always sends back the answer with the same QID as the one it received in the reqeust.

There have been several exploits based on guessing the QID. In the old days resolvers simply incremented the QID, making it extremely easy to guess them. After that was exploited most DNS resolvers began using pseudorandom numbers for the QIDs. Because the QID is a 16 bit number there is only a small pool to pick from (only 65536 of them). Security researchers have demonstrated ways of predicting these random numbers.

The Exploit

Now that we have covered the basics, let's get to Kaminsky's exploit.

Imagine that a resolver asks for the IP address of doesnotexist.example.com. An attacker sends back a response that looks like this:

An attacker is trying to trick the resolver into believing that www.example.com now lives at 10.10.10.20, and to remember that for 604800 seconds (7 days). This passes the bailiwick check because the domains in the authority and additional sections are the same domain in the answer. Remember though, the response still needs to contain the same QID for the resolver to accept the response. However, since most of this traffic is happening over UDP there is nothing to prevent attackers from sending a flood of responses to the resolver. But sending answers to questions that have never been asked would be pointless. Also, the attacker's response needs to arrive before the real response does.

Also, it could be hard for an attacker to guess what questions would be asked. Instead, the attacker could set up a webpage with lots of images on it - let's say 1000 images, pointing to various domains, like this:

When a browser tries to render this page it will ask the resolver to look up the address of aaaa.example.com, aaab.example.com, and so on until it has looked up the addresses of all 1000. Along the way it will be sending these requests with different QIDs from 1 to 65535. If the attacker is constantly sending answers to the resolver with QID 12345, for example, it will eventually have the right QID and the response will be accepted. (If it helps to understand this, you could also imagine the attacker sending back 65536 different responses at the same time, each with a different QID, and then you could be certain that the question would receive an answer.)

But what about bailiwick checking? Remember, the attacker doesn't control example.com, he's simply sending his own answers to the resolver, so he can craft the responses to appear to be in bailiwick. The responses might look something like this:

The attacker has sent a response to your resolver and instructed it to store 10.10.10.20 as the IP address of www.example.com and to keep it for 7 days. Further, any other DNS requests will be sent there since the NS record is cached too.

In this exploit it doesn't matter what the question was, the authority and additional sections will always be the same. All that matters is that his response arrives before the real response, for any one of these requests. By flooding the resolver with answers the attacker raises the chances that one will be accepted, any one. Kaminsky discovered a way to combine the QID weakness with bailiwick spoofing to poison caches.

Before the recent patches were released, if a person browsing the web were to visit the attackers webpage then everyone else at that ISP (Comcast, AT&T, Verizon, etc) would be vulnerable. The ISP's recursive DNS servers could send all traffic for www.bankofamerica.com to an IP operated by someone with shady morals.

The Exploit Is Real

Yes, the exploit is real, and it is severe. Cricket Lui, noted expert and an author of the ubiqitous "DNS and BIND" book by O'Reilly, has suggested that this may be the biggest DNS security issue in the history of the Internet, and most other experts seem to agree with him. Dan Kaminsky has said that he was able to exploit systems in less than 10 seconds. That means Dan can control your bank account, your email, your ebay account, or anything else you do online, in a matter of seconds. And you didn't even have to do anything. Any one of the thousands or hundreds of thousands of customers of your ISP could have fallen for it, and it would affect you. This is why it is so important that these systems be patched right away.

Head over to Dan Kaminsky's website, DoxPara Research, and click the "Check my DNS" button on the right to see if your ISP has patched. If you are still vulnerable you should consider using OpenDNS in the mean time.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.