Security Benefits

Introduction: DNS security threats and mitigations

Because of the open, distributed design of the Domain Name System, and its use
of the User Datagram Protocol (UDP), DNS is vulnerable to various forms of
attack.
Public or "open" recursive DNS resolvers are especially at risk, since they do
not restrict incoming packets to a set of allowable source IP addresses.
We are mostly concerned with two common types of attacks:

Spoofing attacks leading to DNS cache poisoning.
Various types of DNS spoofing and forgery exploits abound, which aim to
redirect users from legitimate sites to malicious websites.
These include so-called Kaminsky attacks, in which attackers take
authoritative control of an entire DNS zone.

Denial-of-service (DoS) attacks.
Attackers may launch DDoS attacks against the resolvers themselves,
or hijack resolvers to launch DoS attacks on other systems.
Attacks that use DNS servers to launch DoS attacks on other systems by
exploiting large DNS record/response size are known as amplification
attacks.

Each class of attack is discussed further below.

Cache poisoning attacks

There are several variants of DNS spoofing attacks that can result in cache
poisoning, but the general scenario is as follows:

The attacker sends a target DNS resolver multiple queries for a domain name
for which s/he knows the server is not authoritative, and that is unlikely
to be in the server's cache.

The resolver sends out requests to other name servers
(whose IP addresses the attacker can also predict).

In the meantime, the attacker floods the victim server with forged responses
that appear to originate from the delegated name server.
The responses contain records that ultimately resolve the requested domain
to IP addresses controlled by the attacker.
They might contain answer records for the resolved name or, worse, they may
further delegate authority to a name server owned by the attacker, so that
s/he takes control of an entire zone.

If one of the forged responses matches the resolver's request (for example,
by query name, type, ID and resolver source port) and is received before a
response from the genuine name server, the resolver accepts the forged
response and caches it, and discards the genuine response.

Future queries for the compromised domain or zone are answered with the
forged DNS resolutions from the cache.
If the attacker has specified a very long time-to-live on the forged
response, the forged records stay in the cache for as long as possible
without being refreshed.

DoS and amplification attacks

DNS resolvers are subject to the usual DoS threats that plague any networked
system.
However, amplification attacks are of particular concern because DNS resolvers
are attractive targets to attackers who exploit the resolvers' large
response-to-request size ratio to gain additional free bandwidth.
Resolvers that support EDNS0 (Extension Mechanisms for DNS) are especially
vulnerable because of the substantially larger packet size that they can return.

In an amplification scenario, the attack proceeds as follows:

The attacker sends a victim DNS server queries using a forged source IP
address.
The queries may be sent from a single system or a network of systems all
using the same forged IP address.
The queries are for records that the attacker knows will result in much
larger responses, up to several dozen times1
the size of the original queries (hence the name "amplification" attack).

The victim server sends the large responses to the source IP address passed
in the forged requests, overwhelming the system and causing a DoS situation.

Mitigations

The standard system-wide solution to DNS vulnerabilities is DNSSEC.
However, until it is universally implemented, open DNS resolvers need to
independently take some measures to mitigate against known threats.
Many techniques have been proposed; see
IETF RFC 5452: Measures for making DNS more resilient against forged answers
for an overview of most of them.
In Google Public DNS, we have implemented, and we recommend,
the following approaches:

Overprovisioning machine resources to protect against direct
DoS attacks on the resolvers themselves.
Since IP addresses are trivial for attackers to forge, it's impossible to
block queries based on IP address or subnet;
the only effective way to handle such attacks is to simply absorb the load.

Implementing basic validity-checking of response packets
and of name server credibility, to protect against simple cache poisoning.
These are standard mechanisms and sanity checks that any standards-compliant
caching resolver should perform.

Adding entropy to request messages, to reduce the
probability of more sophisticated spoofing/cache poisoning attacks such as
Kaminsky attacks.
There are many recommended techniques for adding entropy, including
randomizing source ports; randomizing the choice of name servers
(destination IP addresses); randomizing case in name requests; and appending
nonce prefixes to name requests.
Below, we give an overview of the benefits, limitations, and challenges of
each of these techniques, and discuss how we implemented them in Google
Public DNS.

Monitoring the service for the client IPs using the most bandwidth and
experiencing the highest response-to-request size ratio.

DNSSEC

The Domain Name Security Extensions (DNSSEC) standard is specified in several
IETF RFCs:
4033, 4034, 4035, and 5155.

Resolvers that implement DNSSEC counter cache poisoning attacks by verifying the
authenticity of responses received from name servers.
Each DNS zone maintains a set of private/public key pairs and for each DNS
record, a unique digital signature is generated and encrypted using the private
key.
The corresponding public key is then authenticated via a chain of trust by keys
belonging to parent zones.
DNSSEC-compliant resolvers reject reponses that do not contain the correct
signatures.
DNSSEC effectively prevents responses from being tampered with, because in
practice, signatures are almost impossible to forge without access to private
keys.

As of January 2013, Google Public DNS fully supports DNSSEC.
We accept and forward DNSSEC-formatted messages and validate responses for
correct authentication.
We strongly encourage other resolvers to do the same.

DNS-over-HTTPS

As of April 2016, Google Public DNS offers DNS-over-HTTPS, DNS resolution
over an encrypted HTTPS connection.
DNS-over-HTTPS prevents tampering, eavesdropping and spoofing, greatly
enhancing privacy and security between a client and Google Public DNS.
It complements DNSSEC to provide end-to-end authenticated DNS lookups.

Implementing basic validity checking

Some DNS cache corruption can be due to unintentional, and not necessarily
malicious, mismatches between requests and responses (e.g. perhaps because of a
misconfigured name server, a bug in the DNS software, and so on).
At a minimum, DNS resolvers should put in checks to verify the credibility and
relevance of name servers' responses.
We recommend (and implement) all of the following defenses:

Do not set the recursive bit in outgoing requests, and always follow
delegation chains explicitly.
Disabling the recursive bit ensures that your resolver operates in
"iterative" mode so that you query each name server in the delegation chain
explicitly, rather than allowing another name server to perform these
queries on your behalf.

Reject suspicious response messages.
See below for details of what we consider to be "suspicious".

Do not return A records to clients based on glue records cached from
previous requests.
For example, if you receive a client query for ns1.example.com, you should
re-resolve the address, rather than sending an A record based on cached glue
records returned from a .com TLD name server.

Rejecting responses that do not meet required criteria

Google Public DNS rejects all of the following:

Unparseable or malformed responses.

Responses in which the query ID, source IP, source port, or query name do
not match those of the request.

Records which are not relevant to the request.

Answer records for which we cannot reconstruct the CNAME chain.

Records (in the answer, authority, or additional sections) for which the
responding name server is not credible.
We determine the "credibility" of a name server by its place in the
delegation chain for a given domain.
Google Public DNS caches delegation chain information, and we verify each
incoming response against the cached information to determine the responding
name server's credibility for responding to a particular request.

Adding entropy to requests

Once a resolver does enforce basic sanity checks, an attacker has to flood the
victim resolver with responses in an effort to match the query ID, UDP port
(of the request), IP address (of the response), and query name of the original
request before the legitimate name server does.

Unfortunately, this is not difficult to achieve, as the one uniquely identifying
field, the query ID, is only 16 bits long (i.e. for a 1/65,536 chance in getting
it right).
The other fields are also limited in range, making the total number of unique
combinations a relatively low number.
See RFC 5452, Section 7 for a calculation of the combinatorics involved.

Therefore, the challenge is to add as much entropy to the request packet as
possible, within the standard format of the DNS message, to make it more
difficult for attackers to successfully match a valid combination of fields
within the window of opportunity.
We recommend, and have implemented, all the techniques discussed in the
following sections.

Randomizing source ports

As a basic step, never allow outgoing request packets to use the default UDP
port 53, or to use a predictable algorithm for assigning multiple ports
(e.g. simple incrementing).
Use as wide a range of ports from 1024 to 65535 as allowable in your system,
and use a reliable random number generator to assign ports.
For example, Google Public DNS uses ~15 bits, to allow for approximately 32,000
different port numbers.

Note that if your servers are deployed behind firewalls, load-balancers,
or other devices that perform network address translation (NAT), those devices
may de-randomize ports on outgoing packets.
Make sure you configure NAT devices to disable port de-randomization.

Randomizing choice of name servers

Some resolvers, when sending out requests to root, TLD, or other name servers,
select the name server's IP address based on the shortest distance (latency).
We recommend that you randomize destination IP addresses to add entropy to the
outgoing requests.
In Google Public DNS, we simply pick a name server randomly among configured
name servers for each zone, somewhat favoring fast and reliable name servers.

If you are concerned about latency, you can use round-trip time (RTT) banding,
which consists of randomizing within a range of addresses that are below a
certain latency threshold (e.g. 30 ms, 300 ms, etc.).

Randomizing case in query names

The DNS standards require that name servers treat names with case-insensitivity.
That is, the names example.com and EXAMPLE.COM should resolve to the same IP
address2.
However, in the response, most name servers echo back the name as it appeared in
the request, preserving the original case.

Therefore, another way to add entropy to requests is to randomly vary the case
of letters in domain names queried.
This technique, also known as "0x20" because bit 0x20 is used to set the case of
US-ASCII letters, was first proposed in the IETF internet draft
Use of Bit 0x20 in DNS Labels to Improve Transaction Identity.
With this technique, the name server response must match not only the query name
but the case of every letter in the name string;
for example, wWw.eXaMpLe.CoM or WwW.ExamPLe.COm.
This may add little or no entropy to queries for the top-level and root domains,
but it's effective for most hostnames.

One significant challenge we discovered when implementing this technique is that
some name servers do not follow the expected response behavior:

Some name servers respond with complete case-insensitivity:
they correctly return the same results regardless of case in the request,
but the response does not match the exact case of the name in the request.

Other name servers respond with complete case-sensitivity (in violation of
the DNS standards):
they handle equivalent names differently depending on case in the request,
either failing to reply at all or returning incorrect NXDOMAIN responses
that match the exact case of the name in the request.

For both of these types of name servers, altering the case of the query name
would produce undesirable results:
for the first group, the response would be indistinguishable from a forged
response;
for the second group, the response (if any) could be totally incorrect.

Our current solution to this problem is to create a whitelist of name servers
which we know apply the standards correctly, and to only apply the case
randomization technique in requests to those servers.
We also list the appropriate exception subdomains for each of them, based on
analyzing our logs.
If a response that appears to come from those servers does not contain the
correct case, we reject the response.
The whitelisted name servers comprise more than 70% of our traffic.

Note that while upper and lower case letters are allowed in domain names,
no significance is attached to the case.
That is, two names with the same spelling but different case are to be treated
as if identical.

Prepending nonce labels to query names

If a resolver cannot directly resolve a name from the cache, or cannot directly
query an authoritative name server, then it must follow referrals from a root or
TLD name server.
In most cases, requests to the root or TLD name servers will result in a
referral to another name server,
rather than an attempt to resolve the name to an IP address.
For such requests, it should therefore be safe to attach a random label to a
query name to increase the entropy of the request, while not risking a failure
to resolve a non-existent name.
That is, sending a request to a referring name server for a name prefixed with a
nonce label, such as entriih-f10r3.www.google.com, should return the same
result as a request for www.google.com.

Although in practice such requests make up less than 3% of outgoing requests,
assuming normal traffic (since most queries can be answered directly from the
cache or by a single query), these are precisely the types of requests that an
attacker tries to force a resolver to issue.
Therefore, this technique can be very effective at preventing Kaminsky-style
exploits.

Implementing this technique requires that nonce labels only be used for requests
that are guaranteed to result in referrals;
that is, responses that do not contain records in the answers section.
However, we encountered several challenges when attempting to define the set of
such requests:

Some country-code TLD (ccTLD) name servers are actually authoritative for
other second-level TLDs (2LDs).
Although they have two labels, 2LDs behave just like TLDs, which is why they
are often handled by ccTLD name servers.
For example, the .uk name servers are also authoritative for the mod.uk
and nic.uk zones, and, hence, hostnames contained in those zones, such as
www.nic.uk, www.mod.uk, and so on.
In other words, requests to ccTLD name servers for resolution of such
hostnames will not result in referrals, but in authoritative answers;
appending nonce labels to such hostnames will cause the names to be
unresolvable.

Sometimes generic TLD (gTLD) name servers return non-authoritative responses
for name servers.
That is, there are some name server hostnames that happen to live in a gTLD
zone rather than in the zone for their domain.
A gTLD will return a non-authoritative answer for these hostnames,
using whatever glue record it happens to have in its database, rather than
returning a referral.
For example, the name server ns3.indexonlineserver.com used to be in the
.COM gTLD zone rather than in the indexonlineserver.com zone.
When we issued a request to a gTLD server for n3.indexonlineserver.com,
we got an IP address for it, rather than a referral.
However, if we prepended a nonce label, we got a referral to
indexonlineserver.com, which was then unable to resolve the hostname.
Therefore, we cannot append nonce labels for name servers which require a
resolution from a gTLD server.

Authorities for zones and hostnames change over time.
This can cause a nonce-prepended hostname that was once resolvable to become
unresolvable if the delegation chain changes.

To address these challenges, we created a "blacklist" file containing exceptions
for which we cannot append nonce labels.
The file is populated with hostnames for which TLD name servers return
non-referring responses, according to our server logs.
We continually review the exceptions list to ensure that it stays valid over
time.

Removing duplicate queries

DNS resolvers are vulnerable to "birthday attacks", so called because they
exploit the mathematical "birthday paradox", in which the likelihood of a match
does not require a large number of inputs.
Birthday attacks involve flooding the victim server not only with forged
responses but also with initial queries, counting on the resolver to issue
multiple requests for a single name resolution.
The greater the number of issued outgoing requests, the greater the probability
that the attacker will match one of those requests with a forged response:
an attacker only needs on the order of 300 in-flight requests for a 50% success
chance at matching a forged response, and 700 requests for close to 100%
success.

To guard against this attack strategy, you should be sure to discard all
duplicate queries from the outbound queue.
For example, Google Public DNS, never allows more than a single outstanding
request for the same query name, query type, and destination IP address.

Rate-limiting queries

Open recursive resolvers are attractive targets for launching amplification
attacks.
They are high-capacity, high-reliability servers and can produce larger
responses than a typical authoritative name server—especially if an
attacker can inject a large response into their cache.
It is incumbent on any developer of an open DNS service to prevent their
servers from being used to launch attacks on other systems.

Amplification attacks can be difficult to detect while they are occurring.
Attackers can launch an attack via thousands of open resolvers, so that each
resolver only sees a small fraction of the overall query volume and cannot
extract a clear signal that it has been compromised.

Malicious traffic must be blocked without any disruption or degration of the
DNS service to normal users.
DNS is an essential network service, so shutting down servers to cut off an
attack is not an option, nor is denying service to any given client IP for
too long.
Resolvers must be able to quickly block an attack as soon as it starts,
and restore fully operational service as soon as the attack ends.

The best approach for combating DoS attacks is to impose a rate-limiting or
"throttling" mechanism.
Google Public DNS implements two kinds of rate control:

Rate control of outgoing requests to other name servers.
To protect other DNS name servers against DoS attacks that could be launched
from our resolver servers, Google Public DNS enforces QPS
limits on outgoing requests from each serving cluster for each name server
IP address.

Rate control of outgoing responses to clients.
To protect any other systems against amplification and traditional
distributed DoS (botnet) attacks that could be launched from our resolver
servers, Google Public DNS performs two types of rate limiting on client
queries:

To protect against traditional volume-based attacks, each server imposes
per-client-IP QPS and average bandwidth limits.

To guard against amplification attacks, in which large responses to
small queries are exploited, each server enforces a per-client-IP
maximum average amplification factor.
The average amplification factor is a configurable ratio of
response-to-query size, determined from historical traffic patterns
observed in our server logs.

If DNS queries from one source IP address exceed the maximum QPS rate,
excess queries will be dropped.
If DNS queries over UDP from one source IP address exceed
the average bandwidth or amplification limit consistently
(the occasional large response will pass),
queries may be dropped or only a small response may be sent.
Small responses may be an error response or
an empty response with the truncation bit set
(so that most legitimate queries will be retried via TCP and succeed).
Not all systems or programs will retry via TCP,
and DNS over TCP may be blocked by firewalls on the client side,
so some applications may not operate correctly when replies are truncated.
Nonetheless, truncation allows RFC-compliant clients to work properly in
most cases.