Accurate knowledge of AS business relationships is relevant to both
technical and economic aspects of the Internet's inter-domain
structure. Firstly, AS relationships determine routing policies that
introduce a non-trivial set of constraints to paths over which
Internet traffic can flow. This has implications for network
robustness, traffic engineering, macroscopic topology measurement
strategies, and other research and operational considerations.
Secondly, macroscopic analysis of AS relationships not only yields
insight into the economic underpinnings of business realities in the
current Internet, it also provides a solid validation framework for
economy-based modeling of Internet topology evolution. Indeed, the
Internet AS-level topology and its evolutionary dynamics are
consequences of business decisions that Internet players make.
Therefore, the ability to infer AS relationships is a promising
tool for understanding and modeling the economic forces
that drive the evolution of the Internet topology and its hierarchy.

Background

The Internet is composed of thousands of ISPs that operate individual
parts of the Internet infrastructure. ISPs engage in both formal and
informal relationships to collectively and ubiquitously route traffic
in the Internet. These relationships are usually realized in the form
of business agreements that translate into engineering constraints on
traffic flows within and across individual networks participating in
the global Internet routing system.

Accurate data on the structure of actual relationships among ASes is
required for many research efforts concerned with performance,
robustness, and evolution of the global Internet. Examples of both
research and operational tasks that cannot neglect AS relationships
include:

development of better topology generators that account for the topological idiosyncrasies associated with AS relationships;

selection of data centers for server replicas by measuring the origin
of traffic to existing servers and evaluating connectivity and AS relationships of candidate data centers; and

selection of peers or upstream providers based on connectivity and AS relationships of candidate ISPs.

Figure 1. Types of AS relationships. The ASes at the bottom of
the graph, D, E, and F, are customers of those above. ISPs in
the middle, B and C, are both providers of ASes below and
customers of ISPs above. ISPs B and C are also peers of each
other. ISP A at the top is a provider to B and C and a customer
of no one.

Although business agreements between ISPs can be complicated, the
model introduced by Gao [GAO] abstracts business relationships into
the following three most common types:
customer-to-provider (c2p) (or if looked
at from the opposite direction, provider-to-customer p2c), peer-to-peer (p2p), and sibling-to-sibling (s2s).

The justification for this classification is that an AS must buy
transit services for any traffic destined to parts of the Internet
that this AS neither owns nor can reach through its customers. In
Figure 1, where arrow directions reflect flows of money, ASes at
lower levels pay ISPs at higher levels in exchange for access to
the rest of the Internet. The customer
ISP pays the provider ISP for transit.
Links between a customer and a provider are c2p
(p2c) links. In Figure 1, examples of
c2p links are D->B, E->B, F->C, B->A, and C->A.

A p2p link connects two ISPs who have agreed to
exchange traffic on a quid pro quo basis. Peers exchange traffic
only between each other and each other's customers. This relationship
allows peering ISPs to save money on transit costs they would
otherwise have to pay to their transit providers for such traffic.
In Figure 1, B-C is a p2p link. Note the lack of direction of the
link, indicating that neither B nor C is paying each other for
the traffic they exchange.

An s2s link connects two ASes administratively
belonging to the same ISP.
Such links usually appear as a result of mergers and acquisitions, or under
certain network management scenarios.

Figure 2. The top two paths 1 and 2 are valid, while the bottom two
3 and 4 are invalid.

We also use the notion of money transfers between ASes to define
valid and invalid
AS paths. A valid path between source
and destination ASes is one in which for every
ISP providing transit (a transit provider), there is a payee. The
payee of the transit provider must be its immediate neighbor in the
path. An invalid path is one in which
there is at least one transit provider
not paid by a neighbor in the path.

In Figure 2 the top two examples are valid paths, while the bottom
two are invalid. In Example 1 the transit providers are A, B, and
C. ISPs B and C pay to A, D pays to B, and F pays to C. In Example
2 the transit providers are B and C, and they are paid by D and F
respectively. In contrast, in Example 3 the transit provider is B,
but not only does no one pay B, but B itself pays both A and Z.
Example 4 also illustrates a situation where nobody pays transit
provider B.

We conclude that a valid path must have the following valid path pattern: zero or more c2p links,
followed by zero or one p2p link, followed by zero or more p2c
links. In addition, s2s links can appear in any number anywhere in
the path.

History of inference algorithms

Service providers consider the policy details of their business
relationships as proprietary information and do not generally make
them public. Therefore, Internet researchers have to rely upon
indirect AS relationship inference algorithms in order to build a
picture of Internet business structure.

Gao's pioneering work [GAO] inspired many researchers to seek
approaches to inferring ISP business relationships using information
from publicly available BGP routing tables. Gao used the concept
of valid paths as the basis for her inference heuristic and identified
the top provider in a given path based on AS degree (the number of
ASes connected to a given AS).

Subramanian et al. [SARK] slightly
relaxed the problem by not inferring s2s links, and provided a more
elegant mathematical formulation based on the concept of valid
paths. Assuming maximization of the number of valid paths as a
natural objective, they formulated the AS relationship inference
problem as a combinatorial optimization problem: given an undirected
graph G derived from a set of BGP paths
P, assign the edge type (c2p or p2p)
to every edge in G such that the total
number of valid paths in P is maximized.
[SARK] called the problem the type-of-relationship (ToR) problem,
conjectured that it is NP-complete, and provided a heuristic solution.

Di Battista et al. [DPP] and independently
Erlebach et al. [EHS] proved that the
ToR problem is indeed NP-complete. EHS proved also that it is even
harder, APX-complete. More importantly for practical purposes, both
DPP and EHS demonstrated that p2p links cannot
be inferred in the ToR problem formulation and developed mathematically
rigorous approximate solutions to the ToR problem but inferred
only c2p and p2c links. We note that neither [SARK] nor [GAO]
technique offers a solution to the problem of reliable identification
of p2p links due to their low accuracy as demonstrated by Xia et al. [XG].

In addition to its inability to infer p2p links, there are other
issues with the ToR formulation that we identified in [DKH1]. In
particular, for some links either relationship (c2p or p2c) results
in the same number of invalid paths. As a result, ToR labels such
links randomly, classifying them as c2p or p2c with 50%-50%
probability. In some cases this approach leads to obviously incorrect
inferences, e.g., well-known large providers are inferred as customers
of small ASes. In [DKH1] we resolved this issue by using multiobjective
optimization techniques incorporating both the notion of valid paths
and AS importance as reflected in AS degree. In [DKH2] we introduce
new improved algorithms to determine not only c2p but also p2p
links. These improvements achieve high levels of accuracy of AS
relationship inference as we demonstrate via direct validation with
network administrators of a set of ASes.

One useful view of the resulting relationship graph is to examine
the customer cone -- the set of ASes that can be reached
from each AS following only its customer links. The size of the
customer cone of an AS reflects the number of ASes that pay, directly
or indirectly for transit, and provides a better metric of the size
of an AS than its degree. A naive method to maximize the number of
valid paths creates Strongly Connected Components (SCC)s in the
graph [HK]. By definition, an AS i can reach every other AS j in
the SCC by following customer links, and hence it can also reach
every AS in the customer cone of AS j. In all cases we examined
this SCC eventually contains all Tier 1 ASes, making it impossible
to differentiate between them. We are currently working on a new
method which prevents the creation of an SCC, albeit at the cost
of reducing the number of valid paths from 99.6% to around 97%.

We implemented the following methodology of the relationship-based AS ranking:

Build an AS-level graph of the Internet from publicly available
BGP table data and annotate links in this graph with inferred
AS relationships.

Define the AS customer cone of an
AS A as the AS A itself plus all the ASes that can be reached
from A following only p2c and s2s links (but not c2p or p2p!).
In other words, A's customer cone is A, plus A's customers,
plus its customers' customers, and so on.

Rank ASes by the following three customer cone size metrics:
the number of ASes in the cone, the number of unique prefixes
advertised by these ASes, and the number of /24 blocks in the
union of these prefixes.

The size of the AS customer cone in terms of the number of ASes in
the cone provides a coarse measure since individual AS sizes can
differ drastically.

ASes with larger customer cones have an especially important role
in the Internet's capital and governance structure. At the top of
this hierarchy are ISPs commonly known as Tier-1 ISPs. They do not
pay for transit to upstream providers at all; instead they
peer with each other to guarantee their connectivity to all
destinations in the Internet. At the bottom of the hierarchy are
customer ASes who do not have their own customers and pay
providers to reach all destinations in the Internet.

ISPs maintain a set of points-of-presence (PoPs) -- locations
where ISPs have routers/servers and related equipment/personnel --
across the world. The refer to the set of these PoP locations as the
ISP's geographic footprint . Geographic footprint is an
important part of peer selection, as an ISP can only peer with other
ISPs at locations where both ISPs have a PoP.

To infer an ISP's geographic footprint, we start with the set of
prefixes in the BGP tables from the Route Views project routeview2 node
and RIPE NCC's
rcc12 node. We then break down these prefixes into the smallest set of
IP addresses which Netacuity maps to the same geographic location. We set
this as the lower bound on the number of metro areas in which the
ISP has presence.

Although we know of no more rigorous empirical analysis of
macroscopic Internet topology enriched with AS relationships, we
recognize that resource limitations constrain the quality of the
science we can do.

AS relationships are more complex than allowed for in our approach.
The semantics of routing relationships between the same two ASes
can differ by peering location or even by prefix; our model
oversimplifies these cases by assigning a single relationship to
each pair of ASes.

A truly accurate picture of the Internet topology would require
collection of data from every AS, while our automated ranking
procedure is limited to the measurement points publicly available
at Route Views.

As in all analyses of massive datasets, our heuristics have a
number of associated external parameters. We fine tune the values
of these parameters based on our pre-existing (but limited) notion
of the correct answer as well as experience with the algorithm
that suggests auspicious ranges.

More monitor points, more probing, cross-correlative analysis in
conjunction with other sources of data, and more powerful data
processing techniques to support larger topology samples would
improve the integrity and utility of the relationship-based AS
ranking.

CAIDA provides two AS Relationship datasets: serial-1 and serial-2.
The 'serial-1' directory contains AS relationships inferred from BGP
using the
method described in
"AS
Relationships, Customer Cones, and Validation".
Serial-2 adds links inferred from BGP communities using
the method described in
"Inferring
Multilateral Peering"
and traceroute.

Serial-1 Data is available from 2004 to present, with one file created per
week in 2006 and one per month in prior years. Each file contains a
full AS graph derived from RouteViews BGP table snapshots taken at
8-hour intervals over a 5-day period. The AS relationships available
are customer-provider (and provider-customer in the opposite
direction), peer-to-peer, and sibling-to-sibling. See the comments at
the beginning of each file for details of the file format.

Serial-2 Data is available from October 2015 to the present, with one file
created per week. In addition to the links from the serial-1 graph, we add
AS links inferred from BGP communities collected from IX looking
glass servers collected in a single day and tracerouter data collected on
the same day from CAIDA's ark monitors.

To do this we first infer which AS owns each router independent of the
interface addresses observed at that router. The ownership inferences
are based on IP-to-AS mapping derived from public BGP data, list of
peering prefixes from PeeringDB, and the previously inferred business AS
relationships. Then we convert the observed IP path into an AS path
using the router ownership information (rather than mapping each
observed IP to AS directly) and retain the first AS link in the
resulting path for the AS graph.

Links discovered in this way are assumed to be peering links, since
customer provider links are normally visible in the Routeviews BGP tables.

The general serial-2 procedure for creating a file is as follows:

Collect BGP communites from IX looking glass servers.

Infer peering links between pairs of AS which accept routes from
each other.

Collect archived BGP data from Routeviews and RIPE RIS.

Infer peering links at points in the obsered AS paths that cross an
known IX.

Collect traceroutes from ark monitors.

Convert the IP path to AS path using inferred ownership and
keep the first AS link in the path.