This annual report covers CAIDA's activities in 2009, summarizing
highlights from our research, infrastructure, data-sharing
and outreach activities.
Our current research projects span topology, routing, traffic,
economics, and policy. Our infrastructure activities support
several measurement-based studies of the Internet's core
infrastructure, with focus on the health and integrity of the global
Internet's topology, routing, addressing, and naming systems.

We made significant advances (again...) in Internet topology research,
supported by the expanding Ark measurement infrastructure and growing
interest in understanding more about the Internet's robustness,
security, and scalability. We continue to share the largest
Internet topology data sets (IPv4 and IPv6) available to academic
researchers, and we share many aggregated annotated derivative data
sets publicly, including rankings of ISPs annotated with
(our estimated) business relationships between autonomous networks.
Our topology measurement platform supports IPv6, and
ten of our hosting sites provide IPv6 connectivity. We have
developed substantial additional software to better support
distributed measurement experiments. Specific to our IPv4
topology mapping project, we have taken on the task of
optimizing and improving on existing techniques for IP address
alias resolution for large Internet graphs, and are planning to
package up and release an implementation of our algorithms next year.
In 2009 we expanded the capability of other researchers to
use the Ark infrastructure for independent experiments,
including an extensive Internet-wide test of network filtering hygiene.

On the theoretical side of topology research, we finally
published our topology modeling framework that treats annotations
as an extended correlation profile of a network, which supports
rescaling topologies while retaining the same (measured)
annotation profile. We also advanced our exploration of
geometric structure underlying Internet-like topologies
as observed in our and other measurements. Specifically,
hyperbolic geometry captures an important property
of complex networks: exponential expansion in space.
We explored even deeper connections between network topological
structure (e.g., degree distribution, clustering) and
physical phenomena such as curvature and temperature.

These discoveries about topology drive our routing
research agenda, a long-term objective of which is to
enable dramatically more scalable global Internet routing.
We explored the ramifications of the discoveries we made last year
regarding efficient routing on graph topologies statistically
similar to those of the Internet. Based on the evidence, e.g,
clustering, observable on the Internet and other complex networks,
we found that underlying hyperbolic hidden metric spaces
provide a natural explanation
for why so many of these complex networks found in nature can
achieve such phenomenally efficient (greedy) routing without
distributing global topology knowledge. Since the distribution
of global knowledge about network structure is perhaps the most
critically limiting requirement of the current Internet interdomain
routing system, we are still investigating theoretical details
of a potentially radical solution to Internet routing scalability,
which takes advantage of what nature knows that we do not (yet).

We undertook several traffic analysis activities, including
creating a structured taxonomy
of Internet traffic classification papers and their data sets,
and analyzing the "Day in the Life of the Internet" 2009
data set, consisting of 24 hours of detailed DNS packet data
collected at many participating root servers as well other
high-profile DNS servers. We have reduced our traffic analysis
activities in lieu of pursuing progress in the policy space through
participation in DHS's PREDICT project (Protected Repository of Data for
Internet Cyber Threats). As part of this project, we have
proposed a more flexible privacy-sensitive data-sharing framework and
an experiment to test it on the UCSD network telescope
instrumentation next year.

We are growing the scope of our economics and policy
research. We responded to several requests from Internet governance
as well as U.S. government agencies for comments and guidance on
policy matters. We launched a workshop series in Internet economics,
to try to begin framing a research agenda for the emerging but stunted
field of Internet infrastructure economics.
On the theoretical side, we published an analytically tractable model of
Internet evolution at the level of Autonomous Systems (ASes), which
builds on the preferential attachment (PA) model but captures fundamental
differences between transit and non-transit networks. This multi-class
PA model predicts a definitive set of statistics characterizing the
AS topology structure, closing the "measure-model-validate-predict"
loop, and providing further evidence that preferential attachment
is the main driving force behind Internet evolution.

Finally, we engaged in a variety of tool development, data-sharing,
and outreach activities,
including web sites, peer-reviewed papers, technical reports,
presentations, blogging, animations, and workshops. Details of our
activities are below. CAIDA's program plan for 2010-2013 is available
at http://www.caida.org/home/about/progplan/progplan2010/.
Please do not hesitate to send comments or questions to info at caida dot
org.

CAIDA's topology research agenda includes three strategic
areas: 1) macroscopic topology measurement; 2) analysis of the
observable AS-level and router-level hierarchy; 3) topology modeling
in support of routing research.

Activities

Macroscopic Topology Measurements:

We continued large-scale macroscopic topology measurements using
Archipelago (Ark),
our state-of-the-art global measurement platform.
We completed the second full calendar year of the IPv4 Routed /24 Topology Dataset. By the end of 2009, we increased
the number of vantage points to 40 Ark monitors deployed in 22 countries.

We added more monitors with native IPv6 connectivity to the Ark
infrastructure. As of the end of 2009, Ark had 10 monitors collecting the
IPv6 Topology Dataset for researchers to get a view of the
emerging IPv6 global topology.

We continued to collect automated DNS reverse lookups for IP addresses
discovered by the Ark probes and annotated the IPv4 topology data with
corresponding DNS names.

Analysis of the Observable Topology:

We improved our measurement techniques and analysis methodologies for
alias resolution inferences. We use the Ark platform and run the
following three tools:
kapar,
iffinder and
MIDAR.
We then combine the outcomes in order to map IPs to routers as
accurately and completely as feasible.
Using publicly available data from many networks and ground-truth data
provided to us by a large ISP, we tested the efficiency
and veracity of various combinations of alias resolution methods.
Our preliminary results were submitted to ACM Computer Communications
Review (CCR), and appeared
("Internet-Scale IP Alias Resolution Techniques") in the January 2010 issue.

We continued to produce the
AS-level topologies annotated with business relationships between ASes dataset on a bi-weekly basis.
We use our published algorithms to infer these relationships, recognizing
their directional nature, and annotate each link in an AS topology as a
customer-provider or a peer-to-peer (settlement-free interconnection)
relationship.

We created a new version of our popular
AS Core Graph visualizations for both IPv4 and IPv6
address space using January 2009 data collected by Ark monitors.

Topology Modeling:

We introduced a network topology modeling framework that treats annotations
as an extended correlation profile of a network. The framework includes
an algorithm
to rescale and construct networks of varying size that still reproduce the
original measured annotation profile. These results are published in a paper
"Graph Annotations in Modeling Complex Network Topologies"
in ACM Transactions on Modeling and Computer Simulation (TOMACS).

We developed an analytically tractable model of Internet
evolution at the level of Autonomous Systems (ASes) -- the multi-class
preferential attachment (MPA) model. All of the model parameters are
measurable from available Internet topology data. Given the estimated
values of these parameters, our analytic results predict a definitive
set of statistics characterizing the AS topology structure that is not
part of the model formulation. The MPA model thus closes the
"measure-model-validate-predict" loop, and provides further evidence
that preferential attachment is the main driving force behind Internet
evolution. The results were published in
"Evolution of the Internet AS-Level Ecosystem", presented at the First International
Conference on Complex Sciences: Theory and Applications (Complex'2009).

We established a connection between observed scale-free topologies and
hidden hyperbolic geometries of complex networks. Space
expands exponentially in hyperbolic geometry, and scale-free topologies
emerge as a consequence of this exponential expansion. Fermi-Dirac
statistics connects observed topology to hidden geometry: observed edges
are fermions, hidden distances are their energies; the curvature of the
hidden space affects the heterogeneity of the degree distribution, while
clustering is a function of temperature. Understanding the connection
between topology and geometry of complex networks contributes to studying
the efficiency of their functions, and may find practical applications in
many disciplines, ranging from Internet routing to brain, cell signaling,
or protein folding research. We published the paper "Curvature and Temperature of Complex Networks" in Physical Review E.

We showed that the global structure of some real networks is statistically
determined by the distributions of local motifs (small building blocks of
complex networks) of size at most 3, once we augment motifs to include node
degree information. We applied our analysis to various complex networks,
such as: a social web of trust, protein interactions, scientific
collaborations, air transportation, the Internet, and a power grid. In
all cases except the power grid, random networks that maintain the
degree-enriched connectivity profiles for node triples in the original
network reproduce all its local and global properties. Therefore,
network topology generators are guaranteed to reproduce essential local
and global network properties as soon as they reproduce 3-node
connectivity statistics. Our results are published on our web site
("How Small Are Building Blocks of Complex Networks")
and in arxiv.

The primary objective of CAIDA's research in Internet routing is to
develop and evaluate solutions to the impending routing scalability
problems. Our relevant activities focused on two related sub-topics:
greedy routing based on hidden metric spaces underlying real networks;
and the relationship between
routing efficiency and the structure of the network topology.
While motivated by Internet routing, our work in this area has profound
implications for network science in other disciplines (physics, biology,
chemistry, social sciences).

Activities

We studied the process of routing information through networks as a universal
phenomenon existing in both natural and man-made complex systems. In many
complex networks found in nature, nodes communicate efficiently even without
full knowledge of global network connectivity. We demonstrated that the
peculiar structural characteristics of observable complex networks is
consistent with maximizing communication efficiency when using greedy routing
approaches without global knowledge. We also described a general mechanism
that explains this connection between network structure and function, in
"Navigability of complex networks" published in Nature Physics
and given significant
press coverage.

Random scale-free networks are ultrasmall worlds since the average
length of the shortest paths in networks of size N scales as lnlnN. We
showed that these ultrasmall worlds can be navigated in ultrashort time.
Greedy routing on scale-free networks embedded in metric spaces uses only
local information yet finds asymptotically the shortest paths,
direct computation of which requires global topology knowledge.
Our findings imply that the peculiar structure of complex networks ensures
that the lack of global topological awareness has asymptotically no impact
on the length of communication paths. These results have important
consequences for communication systems such as the Internet, where
maintaining knowledge of current topology is a major scalability bottleneck.
We published "Navigating Ultrasmall Worlds in Ultrashort Time" in Physical Review
Letters. This paper received favorable press coverage in
Nature,
NewScientist,
and PhysOrg.

We showed that complex (scale-free) network topologies naturally emerge from
hyperbolic metric spaces. The negatively curved hyperbolic spaces also
ensure extremely efficient greedy forwarding on these
topologies, achieving almost 100% reachability and optimal (i.e., shortest)
path lengths, even under dynamic network conditions.
Our findings suggest that forwarding information through complex networks
like the Internet may be possible without the current overhead of
routing protocols, and may also find practical applications in overlay
networks for tasks such as application-level routing, information sharing,
and data distribution.
These results are published in
"Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces" in ACM SIGMETRICS Performance Evaluation Review.

CAIDA has a long history of passive traces acquisition and curation
aimed at traffic monitoring, classification, and workload characterization.
In 2009 we continued to host visiting researchers who, in collaboration
with CAIDA researchers, analyzed properties of available traces.

Maurizio Dusi developed and tested his new tool,
gt,
which gathers and indexes ground truth information about
passively collected network traffic. A paper describing the
tool, "GT: picking up the truth from the ground for Internet traffic" was
published in ACM SIGCOMM Computer Communication Review (CCR).

Working with traffic traces from backbone links in the US and in Sweden
collected over the period 2002-2009, visiting scholars Wolfgang John and
Mia Zhang analyzed
UDP traffic in the Internet. They found that most UDP flows use
random high ports and carry few packets with little content,
consistent with UDP's role in signaling protocols for increasingly
popular P2P applications.

CAIDA researchers conduct DNS measurements and develop tools, models, and
analysis methodologies for use by DNS operators and researchers.

Activities

NSF funding supporting CAIDA DNS research ended in August 2009.
However, we continued collection and analysis of data from the DNS root
nameservers continuing the series of annual Day-in-the-life-of-the-Internet (DITL) experiments.

In collaboration with ISC and OARC, we held the fourth large-scale data
collection event on March 30 - April 1, 2009 (DITL 2009). We captured
tcpdump traces at nearly all anycast instances of the A, C, E, F, H, K, L,
and M root servers as well as numerous AS112, gTLD and ccTLD domain servers.
The 2009 collection spans three full days of continuous capture. This unique
dataset again represents the most comprehensive measurements of the
root servers to date, and provides researchers with unprecedented
insight into root server workload characteristics and performance.
OARC published a summary of the collection event. These data are available to the
research community via the DNS-OARC. Academic
researchers can participate in the DNS-OARC for free.

We also capture tcpdump traces of these DNS queries for other potential
annotations and for analysis of EDNS0, DNSSEC, and other emerging protocols.

CAIDA recognizes the UCSD Network Telescope, a passive data collection
system focused on a globally routed /8 network that carries almost
no legitimate traffic, as a unique resource whose data may provide
insights for network security researchers. Because we can easily
separate the legitimate traffic from the incoming packets, the
network telescope provides us with a monitoring point for anomalous
traffic that represents almost 1/256th of all IPv4 destination
addresses on the Internet.

Because a network telescope (also known as a blackhole, an Internet
sink, or a darknet) does not contain any real computers, the monitor
does not capture legitimate traffic, but rather communications that
results from
wide range of events, including misconfiguration (e.g. a human being
mis-typing an IP address), malicious scanning of address space by
hackers looking for vulnerable targets, backscatter from random
source denial-of-service attacks, and the automated spread of
malicious software (worms).

To deliver such data to the research community requires technology
to accomplish the data capture and further requires policy
infrastructure to protect the rights and avoid risk to stakeholders.
CAIDA spent much effort in 2009 on building the policy infrastructure
and data sharing framework required to enable the sharing of the data we capture
with the network security researcher community.

Activities

UCSD Network Telescope

In line with our mission to foster a collaborative environment for
data acquisition and sharing, we made Two Days of data in November from our network telescope available
to researchers.

Our dependence on the Internet for our professional, personal, and
political lives has rapidly grown much stronger than our comprehension
of its underlying structure,
performance limits, dynamics, and evolution.
In light of recent milestones in regulatory policy, our understanding
of the underlying economic forces and dynamics of the Internet is of
increasing relevance.

To follow up on our several years of work studying IPv4 exhaustion
and IPv6 deployment (or lack thereof) in response to RIR needs,
in 2009 we offered a draft recommendation for an IPv4 exhaustion research agenda
(none of which, so far as we know, have been pursued).
We also responded to requests from government agencies and
policymaking bodies (including the FCC, DHS, FTC) for comments and
positions on inform policy with the best available empirical data.
As society recognizes the need for an equitable way to pay for this
new communications infrastructure, policymakers will need metrics
to more effectively describe, and policies for more transparently
reporting on, infrastructure penetration, performance, peering,
and prices for bit transmission services.

Early in the year, we put forth a proposal for an ICANN/RIR scenario planning exercise to conduct
a more structured conversation according to established discipline
of scenario planning. While this never happened, later in the year,
on September 23, 2010 CAIDA, in collaboration with Georgia Tech,
hosted the 1st Workshop on Internet Economics via web videoconference. The event
made use of the electronic conference hosting facilities supported
by the California Institute of Technology (CalTech) EVO Collaboration
Network. The goal of this workshop was to bring together researchers,
commercial Internet facilities and service providers, technologists,
theorists, policy makers, RIR stakeholders, and pundits of Internet
economics to try to frame a concrete and useful research agenda for
the emerging but stunted field of Internet infrastructure economics.
We published the final report
in ACM SIGCOMM Computer Communication Review (CCR), April 2010. Vol
40, no. 2, pp. 55-59.

We presented the paper "Evolution of the Internet Ecosystem" at the First International Conference
on Complex Sciences: Theory and Applications (Complex'2009), and
published in the European Physical Journal B, vol. 74, no. 2, March
2010, pp. 271-278.

We published the final report
for the first Workshop on Internet Economics (WIE09) in ACM
SIGCOMM Computer Communication Review (CCR), April 2010. (see above).

Ark's uniquel design considers coordination the fundamental activity
of a measurement infrastructure. Coordination allows the many
pieces of the infrastructure to work together efficiently toward a
common goal and is necessary to enable collaborative use of the
infrastructure by multiple researchers. Archipelago utilizes Marinda,
a coordination facility inspired by David Gelernter's tuple-space
based Linda coordination language. Archipelago extends Gelernter's
tuple space model with features needed to support a globally
distributed measurement infrastructure that hosts heterogeneous
measurements by a community of researchers.

Activities

The Archipelago (Ark) Project expanded its infrastructure scope
in 2009, from 30 monitors in 21 countries at the end of 2008 to
41 monitors in 25 countries at the end of 2009. We also
implemented IPv6 measurements on 10 Ark boxes, and a prototyped
a systemwide topology-measurement-on-demand service.

We improved our infrastructure for meta-data annotations of
Autonomous Systems and IP addresses, augmenting it with DNS data.

Building on our study of existing state-of-the-art IP address alias
resolution technology, we did research, development, and evaluation
of probing and inference algorithms to resolve independent IP
addresses into the same physical device (router). We are planning to
publish the results of this work in 2010.

Our team-probing application uses scamper as its
primary active measurement topology tool. Developed by Matthew
Luckie, it supports IPv4 & IPv6, TCP-, UDP-, and ICMP traceroutes,
ping, path MTU discovery, fine-grained multiplexing of destination
lists, programmatic control via socket, warts format files with
more information than arts++ files including cycle start & end
markers and measurement metadata (e.g., probing parameters).
We contributed patches to scamper, and several software tools
to make it easier to write measurement tools and servers:
ScamperDataFeed, ScamperIO. We also implemented a derivative
tool based on scamper to enable lighter weight measurements that
can still benefit from of some of scamper's functionality.

We implemented persistence in the Marinda tuple space,
allowing us to transparently checkpoint and restart the global
server without disrupting ongoing experiments. We wrote extensive
Marinda installation and programming guides and shared the
software with collaborators for evaluation and feedback
before we release it more broadly.

In collaboration with Rob Beverly of the Naval Postgraduate
School, we developed software support to enhance the spoofer project,
which used Ark to globally expand its measurement of source address
validation and filtering. Using Ark's distributed infrastructure and
approximately 12,000 active measurement clients, our measurements
revealed little improvement over four years of measurement.
80% of the source address filters we observed were implemented a
single IP hop from sources, with over 95% of blocked packets observably
filtered within the source's autonomous system. Our results were
published and presented at IMC2009 in
``Understanding the Efficacy of Deployed Internet Source Address Validation Filtering''.

CAIDA's mission includes providing access to tools for Internet
data collection, analysis and visualization to facilitate network
measurement and management. However, CAIDA does not receive
specific funding for support and maintenance of the tools we
develop. Please check our home page for a complete listing and
taxonomy of CAIDA tools.

2009 Tool Development

CoralReef

The CoralReef Software suite, developed by CAIDA, provides a
comprehensive software solution for data collect and analysis
from passive Internet traffic monitors, in real time or from
trace files. Real-time monitoring support includes system
network interfaces (via libpcap), FreeBSD drivers for a number
network capture cards, including the popular Endace DAG (10GE/OC192,
POS and ATM) cards. The package also includes programming
APIs for C and perl, and applications for capture, analysis,
and web report generation. This package is maintained by CAIDA
developers with the support and collaboration of the Internet
measurement community.

We released CoralReef version 3.8.6 in June of 2009.

CAIDA Tools Download Report

The table below displays all the CAIDA developed tools
distributed via our home page at http://www.caida.org/tools/ and the number of
downloads of each version during 2009.

A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available.

In 2009, CAIDA captured and curated data from three primary sources
of network data:

macroscopic topology data with the Archipelago infrastructure,

passive traffic traces at tier1 OC192 Internet Backbone links,

passive traffic traces from the UCSD Network Telescope

We derived several datasets from this data that we make publicly available to researchers.
These include our
AS Rank,
AS adjacencies,
and
Router adjacencies
datasets. In addition we released a Telescope
Internet "background radiation" dataset
and a Telescope
Conficker dataset.
Some datasets are made publicly available by CAIDA without restrictions to the user,
while access to other datasets is restricted to academic researchers and
CAIDA members, with data access subject to Acceptable Use Policies (AUP)
designed to protect the privacy of monitored communications, to ensure security
of network infrastructure, and to comply with the terms of our agreements with
data providers.

1The total size represents actual disk space. If data are stored in compressed form, the
uncompressed size is given in brackets.2The size of these datasets varies over time as we store and serve a rotating window
of the last 30 days only.

Data Distributed in 2009

We process raw data into specialized datasets to increase its
utility to researchers and to satisfy security and privacy concerns.
In 2009, this resulted in the following datasets:

We count the volume of data downloaded per unique user per unique file,
so even if a user downloads a file 100 times, we only count that file once for that user.
This methodology results in significantly undercounting the total volume of data served through our dataservers
in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.

* AS Taxonomy dataset is included in a
mirror of the GA Tech main AS Taxonomy site, and thus does not
represent all access to this data.

Restricted Access Data

These datasets require that users:

be academic or government researchers, or join CAIDA;

request an account and provide a brief description of their
intended use of the data; and

* We count the volume of data downloaded per unique user per unique file,
so even if a user downloads a file 100 times, we only count that file once for that user.
This methodology results in significantly undercounting the total volume of data served through our dataservers
in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.

Restricted Access Data Requests

The following table shows some statistics about data requests for CAIDA datasets:
the number of requests received, the number of users whose request was granted, and
the number of users that actually downloaded data.

We received about 4% more requests in 2009 then in 2008, and approved 16% more requests for access to restricted datasets.
Almost 80% of the users that are granted access actually access our webservers to download data.

As part of our mission to investigate both practical and
theoretical aspects of the Internet, CAIDA staff actively attend,
contribute to, and host workshops relevant to research and better
understanding of Internet infrastructure, trends, topology,
routing, and security. Our web site has a complete listing of past and
upcoming CAIDA Workshops.

The 2nd CAIDA/WIDE/CASFI Workshop was held on April 4-5, 2009 in Seoul, South Korea. This workshop continued a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the Workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and DNS research. The Workshop will also cover miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants.

The 1st Workshop on Internet Economics (WIE'09) hosted by CAIDA and Georgia Tech was held on September 23, 2009 by web videoconference. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics.

The table below presents the monthly history of traffic to www.caida.org for 2009. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.

CAIDA would like to acknowledge the many people who put forth
great effort towards making CAIDA a success in 2009. The image
below shows the functional organization of CAIDA. Please check the
home page For more complete information about CAIDA staff.