Content Cartography

& DNS

In the recent years we have witnessed an unprecedented expansion of server infrastructures to cope with the always increasing
demand for content in the Internet. In this project we develop methods, some based on DNS, to infer the footprints of major
content delivery infrastructures, to understand their strategies, and to make data available to the research community and decision
makers.

In a parallel effort we study the deployment of DNS resolvers including both local and open ones and we comment on
their proximity to end-users, caching performance, and effect on user-server redirection as performed by major CDNs today.

Web Content Cartography

Recent studies show that a significant part of Internet traffic is delivered
through Web-based applications. To cope with the increasing demand for Web
content, large scale content hosting and delivery infrastructures, such as
data-centers and content distribution networks, are continuously being
deployed. Being able to identify and classify such hosting infrastructures is
helpful not only to content producers, content providers, and ISPs, but also to
the research community at large. For example, to quantify the degree of hosting
infrastructure deployment in the Internet or the replication of Web content.
In this study, we introduce Web Content Cartography, i.e., the identification
and classification of content hosting and delivery infrastructures. We propose
a lightweight and fully automated approach to discover hosting infrastructures
based only on DNS measurements and BGP routing table snapshots. Our
experimental results show that our approach is feasible even with a limited
number of well-distributed vantage points. We find that some popular content is
served exclusively from specific regions and ASes. Furthermore, our
classification enables us to derive content-centric AS rankings that complement
existing AS rankings and shed light on recent observations about shifts in
inter-domain traffic and the AS topology.

EDNS-Client-Subnet Extension

The recently proposed DNS extension, EDNS-Client-Subnet (ECS), has been
quickly adopted by major Internet companies such as Google to better assign
user requests to their servers and improve end-user experience. In this
paper, we show that the adoption of ECS also offers unique, but likely
unintended, opportunities to uncover details about these companies'
operational practices at almost no cost. A key observation is that ECS
allows everyone to resolve domain names of ECS adopters on behalf of any
arbitrary IP/prefix in the Internet. In fact, by utilizing only a single
residential vantage point and relying solely on publicly available
information, we are able to (i) uncover the global footprint of ECS
adopters with very little effort, (ii) infer the DNS response cacheability
and end-user clustering of ECS adopters for an arbitrary network in the
Internet, and (iii) reveal the mapping of users to server locations as
practiced by major ECS adopters. While pointing out such new measurement
opportunities, our work is also intended to make current and future ECS
adopters aware of which operational information gets exposed when utilizing
this recent DNS extension.

DNS Resolvers in the Wild

The Domain Name System (DNS) is a fundamental building block of the Internet.
Today, the performance of more and more applications depend not only on the
responsiveness of DNS, but also the exact answer returned by the queried DNS
resolver, e.g., for Content Distribution Networks (CDN).
In this study, we compare local DNS resolvers against GoogleDNS and OpenDNS for
a large set of vantage points. Our end-host measurements inside 50 commercial
ISPs reveal that two aspects have a significant impact on responsiveness: (1)
the latency to the DNS resolver, (2) the content of the DNS cache when the
query is issued. We also observe significant diversity, even at the AS-level,
among the answers provided by the studied DNS resolvers. We attribute this
diversity to the location-awareness of CDNs as well as to the location of DNS
resolvers that breaks the assumption made by CDNs about the vicinity of the
end-user and its DNS resolver. Our findings pinpoint limitations within the DNS
deployment of some ISPs, as well as the way third-party DNS resolvers bias DNS
replies.