Internet Measurements and Traffic
Analysis

DNS Analysis

The Domain Name System (DNS) is an essential part of the Internet
infrastructure and most Internet applications depend on the proper
functioning of DNS. In this project, we seek to understand the
client-perceived performance and behavior of DNS and investigate the
effectiveness of its caching mechanisms. Our goal is to understand
the factors that affect DNS response latency, the errors and failure
modes of DNS, and its scalability. An analysis of the effectiveness
of DNS caching is especially important in light of several recent
changes in the way DNS is used. Content distribution networks (CDNs)
and popular Web sites with multiple servers are increasingly using DNS
as a level of indirection to help balance load across servers, or for
fault tolerance, or to direct each client request to a topologically
nearby server.

Methodology

Our analysis is based on an extensive collection of packet traces.
The novel idea in our approach is to jointly collect both DNS
packets and associated TCP connection traffic: since TCP
applications drive most DNS traffic, a joint trace collection where
all TCP SYN/FIN/RST packets are collected together with DNS packets
has the potential to allow us to infer things about the way in which
DNS is used. If we only collected DNS packets, we might be able to
infer things like DNS response latencies and failure modes, but not
infer much about its caching effectiveness.

We have been collecting data at the border router connecting MIT's LCS
and AI Lab to the rest of the Internet since Fall 1999. We have
analyzed two weeks worth of data collected in January 2000 and
December 2000. We also collected data from KAIST in Korea in Spring
2001. We have analyzed one week's worth of data from May 2001.

Our analysis has two parts: first, we study the packet traces to
characterize DNS performance as seen by our clients, and draw more
fundamental conclusions about its failure modes and its retransmission
protocol. Then, we conduct trace-driven simulations to explore the
effect of varying time-to-live (TTL) fields and varying degree of
cache sharing on DNS cache hit rates.

We are currently investigating a mathematical framework to capture
asymptotic properties observed by simulations.

Key Results

A paper detailing our current findings will appear at the First ACM
SIGCOMM Internet Measurement Workshop in November 2001.

Our most surprising, non-obvious findings and conclusions are:

About a quarter of all DNS lookups never get an answer. More than
50% of the DNS-related packets in the wide-area correspond to such
lookups!

The DNS retransmission protocol appears to be overly persistent:
while most successful answers are received in at most 2-3
retransmissions, the lack of an answer or response causes a much
larger number of retransmissions and a corresponding number of DNS
packets traverse the wide-area.

Replacing all (i.e., for non-name server hosts) the A-record TTL's
to a value as small of 10 minutes is not likely to degrade the
scalability of DNS in any noticeable way. This is because of the
heavy-tailed nature of accesses to names.

The scalability of DNS has little to do with its hierarchical
organization or to good A-record caching. Most of the DNS name space
is a flat, two-level structure. A-record caching does not seem to add
much more to the per-host or per-application caching done by end
clients today. Rather, the scalability derives from the good name
space partitioning achieved by the cacheability of NS records, which
avoid load on the root and top-level name servers.