Demystifying BGP:

Many system administrators refer to networking — and the Border Gateway Protocol (BGP) in particular — as black magic or voodoo, a domain not to be trifled with and one best left to the highly-specialized network shaman. Perhaps, but not every village can afford a network shaman; often, it’s up to the local system administrator to perform all the miracles.

If your servers are accessed from the Internet and you have or are considering redundant Internet connections (say, a pair of T1 circuits), understanding BGP — what it does, how it works, and how you can leverage it to your benefit — is a real advantage. With BGP, you can solve network performance problems faster than your ISP, independently work around ISP outages, and increase your overall uptime.

As you’ll see, BGP is critical to the operation of the Internet. And while BGP is a large and complex protocol, it’s not magic. (But don’t tell that to the local villagers. Increase uptime and they’ll come to worship you.)

To connect to an Internet server, your computer must be able to send requests to a host and, reciprocally, that host must be able to send replies back to you. But how does your request reach the remote host, and in the case of the remote host, how does its reply reach you?

First, each computer connected to the Internet has an Internet Protocol (IP) address. The IP address, like a street address, uniquely identifies each machine and describes where the machine can be found. To send a packet to a server, your machine simply addresses the packet to the server’s IP address and sends it off.

If your machine and the server are connected to the same physical network, the packet is delivered directly (via Ethernet or another local-area connection technology). However, if the two machines are connected to different physical networks, a “relay” or router has to direct the traffic from one point to another. For example, if your machine is on one subnet of Ethernet and the server is on a separate subnet, a router acts as a gateway to direct the traffic from one subnet to the other.

Of course, as the gateway, the router has to know about the address spaces of both subnets to do the transfer. And in general, that’s what routers do: routers know how to connect endpoints or at least know how to direct traffic to move it closer to its ultimate destination. Depending on how “distant” two computers are, more than one router may handle the traffic, with each router or “hop” moving the traffic closer and closer to its target.

On some networks, the router’s (or routers’) maps or routing tables can be configured manually at installation and left alone thereafter. More frequently though, static configurations are unworkable because they don’t scale. Instead, routing tables are usually maintained automatically using routing protocols, which allow routers to exchange and share subnet location or topology automatically.

Whenever two routers talk to each via a routing protocol, other they exchange information about subnets (routes) they know about. As these route advertisements propagate from one router to another, the network collectively learns how to get packets from point A to point B efficiently. In many networks there’s more than one path from A to B. When given many paths between A and B, a router simply picks the path with the best “metric,” whatever that metric may be (typically, based either on path costs or distances).

And if a network link goes down, the routers on each end of the link purposefully “forget” the routes accessible on the other end, as the link is no longer usable. Updated topology percolate through the network. When the link comes back up, the routers update each other again, calculating new metrics on the fly based on the new reachability information.

Inside a network, internal routing protocols such as RIP, EIGRP, OSPF, and IS-IS are used to dynamically advertise routes to route local traffic efficiently. (Which internal protocol a site chooses depends on hardware support, staff experience, and internal policy). However, across the Internet at large, only one external routing protocol is used to exchange route information: the Border Gateway Protocol (BGP). BGP advertises your routes — the subnets that your network delivers traffic to — to other networks.

Or, as network engineer Avi Freedman describes it, “BGP advertises your routes to other networks… When you advertise routes, [think] of those route advertisements as ‘promises’ to carry data to the IP space represented in the route being advertised. For example, if you advertise 192.204.4.0/24 (the ‘Class C’ starting at 192.204.4.0 and ending at 192.204.4.255), you promise that if someone sends you data destined for any address in [that space], you know how to carry that data to its ultimate destination.”

As an example, BBN (soon to be part of the Level3 network) owns the huge subnet 4.0.0.0/8. BBN breaks this huge subnet of more than sixteen million addresses down into many thousands of smaller subnets, and all of the routers within BBN learn about each of these smaller subnets via an internal routing protocol. However, other large ISPs like UUNet or Sprint don’t need to know (and don’t want to know) about the internal topology and details of the BBN network. As far as another ISP is concerned, if a packet’s destination falls within the subnet 4.0.0.0/8, it’s content to send the traffic to BBN and let BBN route the traffic the rest of the way.

In fact, Sprint’s and UUNet’s routers would be overwhelmed if they were required to keep track of every route to every subnet on the Internet. Instead, BBN uses BGP to advertise a single 4.0.0.0/8 route to its BGP neighbors (its peers). Route aggregation such as this goes a long way to keeping the Internet’s global routing table manageable. The global routing table is a collection of all routes across the Internet, and currently contains about 123,000 routes (and growing).

Beating Alternate Paths to Your Door

In effect, BGP is like a road map for the Internet: BGP lists all of the major subnets, records the major routes between those subnets, and lets you advertise which network paths provide the best “route” to your destinations. (Once inside your network, internal routing protocols act as street maps to deliver traffic “the last mile” to a specific address.)

Of course, if your site is connected to the Internet via one Internet Service Provider (ISP) — a configuration called a single-homed site — all incoming and outgoing Internet traffic traverses a single path between your network and the network of your ISP. BGP is not particularly useful for single-homed sites since there are no optional routes to advertise.

On the other hand, BGP allows multi-homed sites — those sites with connections to more than one ISP — to immediately direct or redirect traffic through alternate paths or routes without waiting for the assistance of its ISPs. With BGP, a multi-homed site can independently control the route of its inbound and outbound traffic, allowing the site to avoid congested corridors and outages. For example, if your site has two T1 circuits connected to two different ISPs and one T1 fails, BGP allows the routers to detect the failure and automatically update Internet-wide routing tables so that traffic is sent over the remaining circuit. This happens quickly and automatically with no user intervention. When the failed circuit returns, it’s smoothly put back into service. Users within your network or accessing your network from anywhere on the Internet often have no idea that anything was amiss.

In contrast, if your site is single-homed, you’re at the mercy of your ISP: their congestion is your congestion, and their outage is your outage.

Obviously, BGP reduces a site’s dependence on any one ISP, a considerable business continuity risk. But implemented well, BGP also provides for better response times. By choosing shorter and/or faster end-to-end paths, users can enjoy quicker web page loads, faster file transfers, among other direct benefits. And although it’s not an explicit goal of BGP, many multi-homing configurations achieve a kind of “load balancing” that allows simultaneous use of redundant circuits rather than using just one connection with the other connections used only as backups.

Like Digital Breadcrumbs

As we’ve seen, BGP is designed to exchange routing information between multi-homed sites such as ISPs. Here’s how BGP works.

In BGP, a subnet (whether it’s as large as BBN’s or just a small range of addresses) is referred to as an Autonomous System, or AS. Each AS has its own AS Number, or ASN, used to identify that AS to the world. (AS Numbers aren’t hard to get; in fact, any multi-homed site qualifies for one.) Table One shows the ASNs of a number of ISPs.

Table One: Some ASNs and the subnet each represents

ASN

Network

3356

Level3

3549

Global Crossing

2529

Demon UK

4589

Easynet

5459

LINX

ASNs are fundamental to BGP: Every time a route is advertised within BGP, the route is “stamped” with the ASN of the router doing the advertising (propagating). A sequence of ASNs form an AS-Path, a kind of digital trail that literally reflects how the route became known to any router.

For example, when your network wants to advertise itself via BGP — a process called originating a route — your BGP router creates a new, empty AS-Path (a “null” path) and advertises your network to each of its external peers. By convention, whenever your BGP router advertises your route to an external BGP router, your BGP router prepends your ASN to the AS-Path. (So AS-Paths that you originated and are stored on a remote BGP router end with your ASN.)

Next, just as your router stamped its ASN in the AS-Path when it advertised your route, each of your BGP peer routers prepends its ASN to the AS-Path, and in turn, advertises the route to each of its external BGP peers. This propagating/stamping step continues “ad infinitum” as each AS prepends its ASN to the AS-Path and passes the route along. (To avoid loops, a BGP router ignores any routing advertisement that contains its own ASN anywhere in the AS-Path.)

Figure One shows a number of AS-Path options for traffic originating at Peak Web Hosting and destined for the London Internet Exchange (LINX). Because PeakWebHosting connects to eight different ISPs, there are eight paths to LINX, although some paths may converge on the way to the destination (for example, PSI goes to Verio).

Figure One is interesting because it shows how just a handful of Autonomous Systems are interconnected. However, it shows only a small fraction of AS-Paths known to Peak Web Hosting. Interestingly, what AS-Paths a system retains and uses — processes called AS filtering and policy routing, respectively — can affect performance, both in the router itself and globally across the Internet.

Not All Autonomous Systems are Equal

While all multi-homed sites qualify for an ASN, not all ASNs are known to every BGP router. Obviously, not every BGP router is connected to every other BGP router, but to keep the global routing table manageable, many ISPs choose to ignore advertisements from Autonomous Systems with small IP address spaces.

For example, most ISPs filter (ignore) routes with a subnet mask longer than /20 (i.e., /21, /22, /23, /24, etc). So, the size of your address pool can affect how far your AS-Path propagates. In some instances, more efficient routes are ignored because other ISPs simply haven’t “heard” about you.

Figure Two pictures a dual-homed network served by two ISPs. The network, mysite.com, is served by ISP A and ISP B; ISP A is the “primary” ISP since mysite.com’s address space, 10.1.1.0/24 (which represents 256 hosts), is allocated from ISP A’s address space of 10.1.0.0/16 (which represents 65,536 hosts). ISPs C and D filter all route advertisements smaller than /20.

Figure Two: A less efficient route caused by filtering

In the figure, a dual-homed router at mysite.com advertises its /24 address space to both ISP A and ISP B. Because mysite.com’s address space is part of ISP A’s address space, ISP A doesn’t advertise mysite.com’s route explicitly. Instead, it aggregates mysite.com’s address space into its own and advertises that much larger block, a /16 address space. ISP B, however, does not aggregate mysite.com’s addresses into a larger pool because the two address pools are disjoint. ISP B simply propagates mysite.com’s address block as-is.

Since ISPs rarely filter customer advertisements based on the length of the subnet mask, the /24 network is known to both ISPs. However, the filtering policies of other networks further “upstream” may prevent learning about every available routing option.

Both ISP C and D use only ISP A to reach mysite.com because the larger aggregate 10.1.0.0/16, which contains the subnet 10.1.1.0/24, passes the “/20 or larger” filtering criteria. ISP D could reach mysite.com most quickly via D to B, but since mysite.com’s routes are filtered by ISPs C and D due to the size of its address space, mysite.com’s traffic takes the longer D to C to A path.

(By the way, if ISP A goes down, ISPs C and D will not learn about mysite.com via ISP B. And if ISP A’s link to mysite.com goes down, but ISP B is directly connected to ISP A, then ISP A will hand traffic destined to mysite.com to ISP B only if ISP A does not filter in the same manner of ISPs C and D.)

It is quite common to find sites whose largest assigned subnets are smaller than a /20. While the ISPs for those sites will not filter (their customers’) route advertisements, there is no guarantee that other ISPs will propagate the advertisements.

Ultimately, you may not be able to tell how far your routing advertisements will propagate. In practice, this is not a problem for most sites, but it is important to understand how this BGP “feature” can affect your site. Also, the size of your advertised subnets only affects the path selection for your inbound traffic.

To avoid this problem, a site should have a large enough subnet assigned either from its ISP or its routing registry. While it’s technically possible to have your own address space allocated from a routing registry like ARIN (for the Americas; RIPE services Europe, and APNIC services the Asian Pacific region) and assigned to your network, it may be difficult to justify allocating 2,000 host addresses. Instead, routing registries prefer that companies that need less than one thousand addresses or so get a unique address space from an ISP, who allocates it from their own pool.

Tuning BGP

As you’ve seen, all routes in a BGP routing table are summaries of network connections somewhere out on the Internet. And, as shown in the previous section, AS-Paths reflect how each route was advertised, providing a kind of diagnostic of routing on the Internet. Indeed, that topology information is invaluable. You can use information gleaned from AS-Paths to avoid known, poorly performing networks, and use the AS-Path to determine the “shortest” path to a given target.

However, “shortest” doesn’t always mean best. Indeed, the “shortest” AS-Path (the number of Autonomous Systems the route passed through) is a misleading notion, one that often causes confusion because it suggests that BGP has more information than is actually available.

First, BGP lacks any knowledge of a network’s physical topology, so a single AS “hop” to cross an AS or ISP could actually require any number of router hops to cross that network.

In addition, BGP employs no network performance data when making routing decisions. Latency (slowness), congestion (saturation), jitter (wide disparities in latency that cause packets to arrive out of order), and packet loss (oversubscription of circuits) across each route remain unknown to BGP. To BGP, an almost-saturated 1.544 megabit T1 circuit and a completely unused 622 megabit OC-12 appear the same.

The AS-Path is just one of many BGP attributes that accompany each route as it’s advertised across the Internet. While AS-Paths, ASNs, and address spaces provide some helpful hints about route metrics, if BGP is left to its own devices, it would choose sub-optimal paths at least some of the time. For example, without some knobs and dials affecting the BGP decision-making process, BGP would select a two-hop AS-path with 30 individual router hops over a four-hop AS-path with only four router hops.

Policy routing lets the network or system administrator override BGP’s default behavior to leverage “longer,” but better performing paths. For example, BGP can be used to direct traffic down a relatively longer, but higher capacity OC-12 connection rather than saturating a smaller T1 with a shorter path to a given destination. A routing policy matches a given criteria and then performs some action, usually adjusting a BGP route’s attributes. If a particular match is not successful, the routing policy continues until a match succeeds or the policy ends.

To understand policy routing, one must understand the steps BGP uses for determining the “best” path to reach each destination prefix.

A BGP router’s first criteria for determining the best route is to find which route in its routing table most closely matches the target address. If a BGP router is looking for the best path for the destination 1.2.3.4 and there’s a choice between 1.2.3.0/24 or 1.0.0.0/8, the more specific /24 route is used. This is called a routing table longest match.

There is no way to override BGP’s longest match functionality. No amount of policy routing can change the preference of a /20 route over a more specific /24 route (other than not learning the /24). To get around this “feature,” networks often de-aggregate their advertisements and “leak” more specific routes to influence outside networks’ routing decisions. However, these leaked routes are still subject to the same filtering problems as illustrated in Figure Two, and leaked routes are looked upon as poor “netiquette” because they increase the global routing table size.

Beyond the routing table longest match, different vendors implement the BGP path selection criteria slightly differently. Some add extra knobs to tune the routing policy and each vendor checks BGP attributes in a slightly different order. However, there are some similarities:

The first BGP attribute checked by all vendors is called local-preference, and it’s just that: the preference a site ascribes to a given route that’s local only to that AS. Across external BGP sessions, all local-preference attributes are reset to the default of 100. Internal BGP neighbors do not alter local-preference unless local router policy dictates otherwise. A higher local-preference is considered more preferred. When two different routes to the same destination have the same local-preference, the path selection process continues.

The next BGP attribute considered is AS-Path hop count. Shorter hop counts are preferred over longer AS-Path hop counts. AS-Path padding is a common method to influence inbound path selection. Networks not wanting a particular inbound path used may pad their route advertisement’s AS-Path. AS-Path padding prepends the local AS number multiple times making the route’s AS-Path appear artificially long (see policy routing example #2). Unfortunately, this method can easily be overridden by adjusting the local-preference to select a route with a longer AS-Path.

The BGP route selection process has many more steps (approximately 10), but the most important is often the last one. If a given router learns a prefix from multiple ISPs and each route has the same local-preference, the same length AS-Path, and all other BGP path attributes are identical, then the final BGP selection criteria is which external BGP neighbor the routes were learned from has the lowest IP address. While this may seen an odd candidate for a tie breaker, all BGP routers in the same AS must ultimately make the same routing decision to avoid routing loops.

Many methods and styles exist for implementing policy routing, and there are many reasons for taking on the effort to tune routing techniques. Policy routing is often used to balance traffic across multiple links. Policy routing is also commonly used to save money. When two alternate paths exist to the same target network, where each has the same relative performance, directing traffic down the cheapest path is to the company’s advantage. To reduce the amount of traffic an ISP has to carry across (pay for) its own network, a variation on this method called hot potato routing directs traffic to the closest egress point off the network even if another, possibly more optimal, path exists but requires traffic remain on the local network longer. Cold potato routing keeps traffic on an ISP’s network as long as possible delivering the traffic to the egress point as close to the target as possible to (hopefully) increase performance.

Deploying BGP

BGP is a wonderful and rich protocol allowing sites with connections to multiple ISPs control over their routing choices and Internet performance. Almost every router manufacturer supports BGP on a wide range of models, and since BGP is an open standard, routers from multiple vendors should interoperate just fine, even within the same network.

Multi-homing is key to providing uptime, and BGP is key to making multi-homing work well.

Configuring BGP Routers

While internal routing protocols auto-discover all of their neighbors and automatically begin exchanging routing information, BGP only connects with peers that are explicitly declared and described in a configuration file. To setup a BGP

peer session

, you need only two pieces of information: the peer router’s interface IP address and its ASN. Here’s a minimal Juniper Networks BGP router configuration that establishes a peer session: