Don't Let DNS be Your Single Point of Failure

DNS is the address book for the Internet. It’s the way for users, customers and partners to find your online presence and communicate and transact with it. Without DNS, the Web, e-mail, hosted applications, and virtually every mission-critical thing done online would become unusable. Yet, because DNS mostly “just works,” many organizations treat the protection of their DNS infrastructure as an afterthought even though they know, or should know, that it represents the lynchpin of almost everything they do online. In fact, most senior level executives cannot say what their DNS availability strategy is – even senior level technology executives.

Just as you would not consider running an Internet company using only one connectivity provider or one Web server, you should not keep all your DNS eggs in one basket. Even if your organization rarely sees targeted attacks against it, it is increasingly common to see collateral damage during large Internet attacks – notice the number of recent incidents of popular social media sites being degraded by attacks against a single user. Sharing infrastructure, such as name servers, with the victim of an attack can cause innocent third parties to suffer. The bad guys today, in command of botnets sometimes comprising tens of thousands of nodes, have more than enough bandwidth at their command to disrupt all but the best-provisioned Internet presence.

When it comes to ensuring the availability of your DNS, diversity should be your motto. Diversity is a simple concept that can be baked into the design of any architecture and can be applied at every part of the DNS infrastructure from software to hardware to the providers you deal with.

If you haven’t already created a DNS strategy that ensures you don’t have a single point of failure, here are three things you can do today to boost your reliability and increase your availability.

1. Define a SLA for your DNS systems

If you are a large enterprise your IT staff are either running your DNS service in house, have contracted an outsourced managed DNS provider, or are using some combination of the two. A smaller organization may be sparing expense by relying on popular free or bundled DNS services.

Regardless of whether you run your DNS outsourced or in-house, it is essential to create a Service Level Agreement (SLA) for your DNS service and ensure that your IT staff and/or outsourced provider adhere to meeting it.

Do not settle for anything less than “five nines” or 99.999% uptime for your DNS systems. After all, if your DNS goes down no Internet traffic gets to your machines and none of your e-mail or essential services work either. A 99% SLA uptime equates to 14.4 minutes of downtime per day, and 7.2 hours of downtime per month. That is almost an entire work day per month lost!

Relying upon a single server in a single physical or logical location is an extremely risky proposition. If that name server should lose connectivity, be hit by denial of service attack or even natural disaster, it essentially takes every other DNS-reliant system down with it.

Your DNS should have sufficient geographical and topographical diversity in the physical location to protect from these sorts of attacks. Particularly, you should consider varying your DNS servers across multiple power grids and continents.

The DNS root server operators and top level domain (TLD) operators, managers of possibly the most-attacked pieces of Internet infrastructure, have reduced geographical diversity to an art. While the DNS is often conversationally described as having “a root”, there are in fact 13 DNS root servers, in theory, each with a single IPv4 address. In practice, there are over 200 nodes in the root server network, located at data centers everywhere from New York to Moscow to Johannesburg to Reykjavik, each advertising the same 13 addresses using IP Anycast. TLD operators work on a similar principle, and manage large IP Anycast networks with advertised and unadvertised TLD root servers, setup in the locations where traffic tends to converge the most. That's massive topographical diversity, and it makes it very difficult for the whole system to fall to an attack. Criminals regularly throw enormous distributed denial of service attacks at the root and TLD servers and these operators have responded by hardening their networks and adding significant geographic and topographic diversity.

For organizations with more modest needs, there's probably little need for a 200-node DNS network, but distributing zone data between more than one location on more than one subnet and using more than one resolution provider, can immediately mitigate the risks of downtime from DNS outages.

3. Deploy diversity in your DNS software

The lack of heterogeneity in software is an over-familiar problem to security administrators, the dominance of Windows in desktop operating systems being the prime example. It may or may not be the case that Microsoft's software is less secure than other OS developers', but it is certainly the case that it is attacked more often and is found more vulnerable precisely because its popularity nears ubiquity. From a security perspective, software monocultures are risky.

BIND is by far the most popular DNS software, and as such it has had its fair share of vulnerabilities, some of them very serious. In recent years, the Kaminsky bug, which would have enabled widespread DNS cache poisoning, was considered so severe that providers had to patch their name servers in secret before the nature of the vulnerability was disclosed.

While the Internet Systems Consortium does a great job of patching newly found security holes in BIND, there will always be a window of vulnerability whenever a new zero-day problem is discovered. Just as it is good practice to have more than two browsers installed or your desktop in the event of prolific attacks against un-patched vulnerabilities in one, it is good practice in DNS infrastructure to spread the risk between more than one type of name server software.

In addition, you should consider this same effect to critical zero day exploits that may affect your other software applications from your Operating System to your Network Monitoring Tools. If you use more than one type of software you can always remove one of them while it is being patched, reducing your vulnerability to exploits.

The diversity principle can be applied in almost every area of an online organization: personnel, network and server hardware providers, power supplies, connectivity providers, all the way down to your power cable supplier. In the end, like everything in Internet security, it's all about balancing the cost of security against the cost of downtime when the security fails. But the key point is that DNS is every bit as important – if not more so – and every bit as vulnerable as any other part of your system architecture.

These three actions - defining an SLA, adding geographic and typographic diversity to your nodes, and deploying software diversity – can ensure greater availability and reliability. The more diverse your architecture, the greater your availability.

Ram Mohan is the Executive Vice President and Chief Technology Officer at Afilias, a global provider of Internet infrastructure services including domain name registry and DNS solutions. Ram also serves as the Security & Stability Advisory Committee's liaison to ICANN’s Board of Directors and has helped direct and write numerous policies effecting domain name registration and DNS security.