First things first

If you operate your own DNS resolver, no matter what brand it is, please upgrade to the latest version now. (Also, if you are disappointed you have to rush with upgrade now, talk to your vendor and ask about early notification for security releases.)

If you want to know how the attack works and how you can protect your systems next time read on.

NXNSAttack principle

The newly discovered vulnerability abuses DNS delegation mechanism to force DNS resolvers to generate more DNS queries to authoritative servers of attacker’s choice. How is that possible?

The whole DNS is built on delegation principle, where authoritative DNS servers responsible for upper levels of DNS hierarchy delegate (we could also say redirect) questions for lower level domains to different servers, thus obviating the need to maintain one huge database with DNS data for the whole Internet.

For example, this is how authoritative DNS server named “a.gtld-servers.com.”, which is responsible for “com.” domain, delegates questions “example.com. A” to different set of servers:

Here we can see that even though we asked for name “example.com. type A” the server authoritative for “com.” domain sent us delegation to “example.com.”, which contains names of two other authoritative DNS servers, but does not contain answer for our original question.

This is so-called glueless delegation, i.e. a delegation which contains only names of authoritative DNS servers (a.iana-servers.net. and b.iana-servers.net.), but does not contain their IP addresses. Obviously DNS resolver cannot send a query to “name”, so the resolver first needs to obtain IPv4 or IPv6 address of authoritative server “a.iana-servers.net.” or “b.iana-servers.net.” and only then it can continue resolving the original query “example.com. A”.

Impact

The main discovery in the NXNSAttack paper is that attackers are able to amplify a single DNS query toward DNS resolver + single DNS answer with fake delegations (i.e. 2 packets) to fire multiple random queries at victim authoritative servers, effectively using standard-compliant DNS resolver as an amplifier for random subdomain attack. In practice packet amplification factor (PAF) very much depends on strategy employed by DNS resolver implementation involved in the attack. For example:

BIND 9.12.3 resolver resolves IPv4 and IPv6 addresses for all NS names obtained from delegation in parallel, leading to packet amplification factor 1000x.

Knot Resolver 5.1.0 resolves NS names one at a time and placesother limits on number of resolution steps generated by single client query, limiting PAF to the order of tens. In fact half of packet amplification factor 48x is caused by workaround for non-compliant authoritative servers. Without workarounds for RFC8020 and RFC7816 non-compliance PAF of Knot Resolver would be only 24x. (Yet another example that workarounds are bad, but that’s another story.)

None of these strategies is inherently wrong, they just represent different trade-offs between resources invested into single client query vs. processing multiple client queries in parallel.

In the end spare capacity on the resolver and authoritative servers determines which party will be the “victim” of NXNSAttack because one of them gets overloaded first. As long as the capacity is sufficient servers will continue to operate just fine, possibly making one of the parties virtually unaffected and in absence of appropriate monitoring oblivious to the attack.

Unfortunately NXNSAttack abuses the very basic principle of DNS protocol, which practically means there is no fix, only mitigation. Luckily researchers followed responsible disclosure protocol and allowed vendors to implement and release mitigation before making the attack public.

NXNSAttack is special case of well-known random subdomain attack, so mitigation approaches fall into two categories: Specific for NXNSAttack and generic for random subdomain attacks.

NXNSAttack mitigation

Unlike traditional random subdomain attacks, in case of NXNSAttack queries are generated by resolver itself. This difference allows vendors to implement simple mitigation techniques like limiting number names resolved when processing a single delegation etc.

Obvious advantage is that it is simple, at least in theory.

Disadvantage of mitigation based on counters is that it requires vendors to invent arbitrary limits not based in the DNS protocol specification, basically determining maximum packet amplification factor. At the same time these arbitrary limits might break resolution for some domains because they put additional limits on the resolution process.

This is a very practical problem because recently published research estimates that 4 % of second-level domains (example.com.) have a problem in their delegation from top-level (com.), so any change which adds arbitrary limits to retries during resolution process has to be weighted very carefully.

In upcoming days we will see how successful vendors were in determining their magic numbers and if they get away without breaking any major domains.

Generic Random Subdomain Attack mitigation

Any random subdomain attack, NXNSAttack included, generates random query names to bypass DNS cache. It follows that generic mitigation has to prevent attackers from bypassing the cache – and luckily we already have technology to do that!

Aggressive Use of DNSSEC-Validated Cache (RFC 8198) uses DNSSEC “metadata” in form of NSEC(3) and RRSIG records to generate negative answers without need to contact authoritative servers. How does that work? First let’s have a look at example NSEC records:

We sent DNS query example. A to one of root DNS servers, and it answered back with NXDOMAIN answer, indicating the name does not exist. At the same time we received two proofs-of-nonexistence in form of NSEC records (and their DNSSEC signatures in RRSIG records).

The first NSEC record events. 86400 IN NSEC exchange. NS DS RRSIG NSEC means that root zone contains domain events. with record types NS DS RRSIG NSEC, and more importantly, there are no domains in between names events. and exchange.

The second NSEC record . 86400 IN NSEC aaa. NS SOA RRSIG NSEC DNSKEY means that root zone contains DNS root . (surprise!) with records types NS SOA RRSIG NSEC DNSKEY, and also that there are no domains in between names . and aaa.. This proves there is no wildcard record *. and thus NXDOMAIN is really the correct answer to query example. A.

Each of the records has time-to-live specified as 86400 seconds. This allows resolvers to synthesize NXDOMAIN answers for any queries falling into indicated ranges (. – aaa., events. – exchange.) for one full day, effectively cutting traffic towards authoritative servers.

As a consequence, querying DNS zone which contains N names at random will populate resolver’s cache in roughly O(N) answers. In other words, cost of eliminating random subdomain attack between DNSSEC-validating resolver and authoritative servers for duration of TTL is linear with number of names in target DNS zone. It works surprisingly well even for large zones with 1 million domains in them – pretty charts about this setup can be found in my older presentation (from 2018).

What next?

First of all upgrade your DNS resolvers to get at least some NXNSAttack mitigation.

Once the dust settles please consider deploying DNSSEC on authoritative servers, and also on DNS resolvers.

Aggressive Use of DNSSEC-Validated Cache limits impact of random subdomain attacks. It is already implemented in Knot Resolver. Unbound also has partial support (NSEC only) and BIND have a prototype as well. If your DNS resolver vendor does not offer it at the moment ask for the feature and stop random subdomain attacks once and for good!

If you are not used to speaking to your DNS software vendor, please fill in cross-vendor survey.