The Effect of DNS on Tor’s Anonymity

The domain name system
(DNS)
is a fundamental part of the Internet, mapping human-readable domains
to machine-readable IP addresses. When fetching a web page in
a browser, a DNS request almost always precedes the actual web traffic. This is
also the case when using
Tor Browser,
the privacy-enhanced browser developed by
The Tor Project
to provide millions of users with anonymity online.

A lot of research
has gone into improving the Tor network, but its use of DNS has received little
attention. In this research project, we set out to learn how DNS can harm the
anonymity of Tor users, and how adversaries can leverage the DNS protocol to
deanonymize users, as illustrated by the diagram to the right. We study
(i) how exposed the DNS protocol is compared to web traffic, (ii)
how Tor exit relays are configured to use DNS, (iii) how existing
website fingerprinting attacks
can be enhanced with DNS, and (iv) how effective these enhanced website
fingerprinting attacks are at Internet-scale.

We show how an attacker can use DNS requests to mount highly precise website
fingerprinting attacks: Mapping DNS traffic to websites is highly accurate even
with simple techniques, and correlating the observed websites with a website
fingerprinting attack greatly improves the precision when monitoring relatively
unpopular websites. Our results show that DNS requests from Tor exit relays
traverse numerous autonomous systems that subsequent web traffic does not
traverse. We also find that a set of exit relays, at times comprising 40% of
Tor’s exit bandwidth, uses Google’s public DNS servers—an
alarmingly high number for a single organization. We believe that Tor relay
operators should take steps to ensure that the network maintains more diversity
into how exit relays resolve DNS domains.

What does our work mean for Tor users? As we outline in
our blog post,
we don’t believe that there is any immediate cause for concern. While our
attacks work well in simulations, not many entities are in a position to mount
them. Besides, they require non-trivial engineering effort to be reliable, and
The Tor Project is already working on
improved website fingerprinting defenses.

We have developed a tool,
ddptr, which
stands for “DNS Delegation Path Traceroute.” The tool determines
the DNS delegation path for a fully qualified domain name, and then runs UDP
traceroutes to all DNS servers on the path. These traceroutes are then compared
to a TCP traceroute to the web server behind the same fully qualified domain
name.

Now imagine that our machine is trying to establish a TCP connection to
baidu.com. How many autonomous systems will our network packets traverse? The
two images to the right show an example. (Click on the images for a larger
version.) First, our machine has to resolve the domain before it can send
packets to the IP address. The left image shows UDP traceroutes to all
DNS1 servers in the delegation path for “baidu.com,” namely
192.58.128.30, 192.43.172.30, and 202.108.22.220. In total, these traceroutes
traversed 13 different autonomous systems, illustrated by the rectangular boxes.
The right image shows a TCP traceroute to “baidu.com.” The
traceroute traversed at least four autonomous systems. In this simple example,
we see that the DNS resolution process for baidu.com exposes our traffic to more
autonomous system than the actual TCP connection, provided we run our own DNS
resolver.

We also publish the (mostly Python and R) scripts that we used to analyse and
plot our data. The git repository also contains the LaTeX source of our paper
and the project page you are looking at.

We publish the following datasets. Each tarball contains a README.txt file that
explains the respective dataset. We also want to encourage you to replicate our
work and reproduce all our datasets. Our
replication guide
is meant to ease this task.

Exit resolver dataset

The following dataset is a collection of .pcap files that we captured on the
authoritative DNS server for tor.nymity.ch. We used this dataset to identify
the DNS resolvers of Tor exit relays. The tarball contains a README file that
provides more details.

DNS request number dataset

The following dataset contains the number of DNS requests per five minute
interval as recorded on our exit relay. The dataset contains two files, one for
a
reduced exit policy,
and one for an exit policy containing only port 80 and 443.

Internet-scale simulation dataset

The following dataset contains data for the (i) fraction of compromised
streams and (ii) time until first compromise for 10,000 simulated Tor
users. We generated the data with
TorPS and by running traceroutes.

DNS requests for Alexa top 1,000,000 domains

The following datasets contain all DNS requests recorded by Tor Browser 5.5.4
when configured to not to browse over Tor for Alexa top 1,000,00 on April 15th 2016.
The data was collected using tbdnsw
as part of the DefecTor toolset.

The following datasets contain a website fingerprinting dataset with 100 samples of Alexa top 9,000
(monitored sites) and one sample each of Alexa top 909,000 (unmonitored) collected with Tor Browser 5.5.4.
The data was collected using tbw
as part of the DefecTor toolset. The toolset also
contains tools for extracting data. We use the same format for cells and extracted features as
Wang et al.