BGP Best External

BGP Best External is used in Active Standby BGP Topologies generally but not limited with that.BGP Best External feature helps BGP to converge much faster by sending external BGP prefixes which wouldn’t normally be sent if they are not overall BGP best path.

There are BGP best internal, BGP best external and BGP Overall best path.

BGP Best external in an active-standby scenarios can be used in MPLS VPN, Internet Business Customers, EBGP Peering Scenarios, Hierarchical large scale Service Provider backbone and many others.

But,How active-standby scenario connection with BGP is created ? In which situation people use active-standby instead of active-active connection ?

Let’s start with the below scenario.

Figure -1 BGP Active-Standby Path Selection Example

First thing you to know that common reason for active-standby or primary-backup link is one link is more expensive than the other.Cost doesn’t have to be a $$ cost only but also be based on latency, performance and bandwidth.

In Figure-1 : IBGP is running in the Service Provider network. Between R1 , R2 and R3 there is an IBGP full mesh sessions.

R2 and R3 is connected to customer network and there is EBGP is running between them. Since BGP Local Preference attribute is set on R3 as 200, R3 is used as egress point. In this case, best path in the Service Provider domain for this customer is R3 and it is advertised to R1 and R2.

Although R2 has a connection to customer network, since overall best path is IBGP path, R2 doesn’t even send its connection to R3 and R1. This is against to BGP RFC but almost all vendors implemented their BGP in this way.

Before we start BGP best external impact and the interaction with and without BGP PIC, let’s remember how BGP would converge in case primary link fails.

In case R3 to customer link fails, R2 can learn the failure through IGP or BGP. If BGP next hop is the R3 loopback (It is always the case with MPLS Layer 3 VPN), when the external link fails, R2 cannot understand the failure from IGP update. R2 in that case waits BGP withdrawal and update messages from R3. When BGP update is completed , R2 install prefixes with its external path into the RIB and FIB.

Now let’s enable BGP best external on R2.

When BGP best external is enabled on R2, although overall best path in BGP comes from IBGP neighbor which is R3, R2 would send its best external path. Since R2 has only 1 external path, R2 would send its path to both R3 and R1.

Here is the trick. Implementations don’t install best external path into the RIB and FIB of the routers unless BGP PIC is enabled. (Some vendors enable BGP PIC by default when BGP best external is enabled, Ex: Cisco)

Yes actually. Since in that case, R3 wouldn’t wait BGP update from R2, it would only install prefixes into the RIB and FIB, because prefixes would be received from R2 and installed in BGP RIB when best external is enabled.

If BGP PIC and also BGP best external is enabled on R3, then in case R3 external link fails, R3 would start to send the traffic towards R2 because prefixes would be installed in RIB and FIB with the backup flag.

You can think that this solves the issue. You think that in the case of primary link fails, secondary link immediately is used without packet loss. Actually No.

If its pure IP network then microloop occurs. Because when R3 starts sending the traffic towards R2 (BGP PIC is enabled), R2 doesn’t know yet that external link of R3 failed. R2 sends the traffic back to R3 and R3 sends it back to R2 because both does the IP lookup for the BGP prefix.

In MPLS VPN it is solved if the VPN label allocation is done per prefix or per CE since R2 and R3 in that case wouldn’t do the IP lookup but based on inner (VPN) label, they would start to send the traffic towards customer.

If VPN allocation is done per VRF, then in that case if two CEs are connected to R2, R2 has to do the IP lookup to distinguish the correct CE and because of IP lookup , R2 would send the traffic back to R3 and microloop would occur again.

So BGP best external and PIC in IP network will suffer from microloop but instead of loosing seconds or minutes for waiting BGP to convergence, when IGP is tuned, microloop can be resolved in less than a second, because R2 would be notified about the R3’s external link failure as fast as possible.

Now let’s look at the other example to see how BGP best external works and how it will help for the convergence. Also this example shows that you may not need BGP Add-path, BGP Shadow RRs/Shadow Sessions to send more than one path from Route Reflector in the specific topologies.

Figure -2 BGP Hierarchical Service Provider Backbone

Above topology was common in the past and still is used in some Service Provider networks.

Pop and Core architecture without MPLS in the core, POP has Route Reflectors in the data path, for redundancy more than one Route Reflector and the routes are summarized at the Core to POP boundary.

In Figure -2, for the simplicity there are only 3 POPs which are connected to the Core network. Each pop has two RRs which have full mesh IBGP sessions between them. In the core, there is PE which is connected to the customer and ASBR which is connected to upstream provider and receive BGP prefix. In the POP there is full mesh IBGP session as well.

Note that, there would be second level Hierarchy in the Core as well, because when the number of POP locations grow, required full mesh IBGP sessions between RRs would be too much.

For a given prefix, in this picture, we have two paths. Path1 from POP1 and Path 2 from POP3.

BGP best external in this topology can be enabled on two places. It can be enabled on the ASBRs and also Route Reflectors.

Let’s assume Local preference is set to 200 on ASBR in Pop1 and 100 on ASBR in Pop3. This makes ASBR in Pop1 is the overall BGP best path for the prefix.

If BGP best external is enabled only on the ASBRs but not on the Router Reflectors, then Route Reflectors in POP 1 and POP2 DOESN’T receive the best external path which is Path 2 from POP3.

But POP3 RR3-A and RR3-B does receive overall best path which is Path 1 and best external path which is Path 2 because simply the ASBR in POP3 sends best external path to its RR which is RR3-A and RR3-B

Here, BGP Add-path could be used to sent best external path from RR3-A and RR3-B to the POP 1 and POP2 Route Reflectors. But the problem with BGP Add-path, it requires every PE, ASBRs and Route Reflector software and hardware upgrade.

Instead, BGP best external is enabled on Route Reflector as well. This allows RR3-A and RR3-B to send best external path which is Path 2 to POP1 and POP2 RRs.

When we have overall best path and BGP best external path on the RRs, in case overall best path goes down, network convergence is greatly increased, especially when BGP PIC is used together with BGP best external on ASBRs and RRs.

For example, if traffic comes from POP2 which doesn’t have ASBR and needs to go to the prefix, RR2-A and RR2-B will have two paths in this case. One is overall best path which is Path1 and another is best external path which is Path2. Both path would be installed in RRs RIB and FIB (BGP PIC is enabled in addition to BGP best external). In case Path 1 fails, since best external path is already in the RIB and FIB, BGP PIC would just changed the pointer to the best external BGP path and you wouldn’t even lose packet.

CONCLUSION :

BGP best external helps BGP convergence both in IP and MPLS network.

BGP best external is especially useful with BGP PIC and some vendors enable BGP PIC by default when the BGP best external is enabled. If you will use BGP best external in the network, test before deployment because your vendor implementation might be slightly different.

BGP best external can be enabled at the Edge of network such as at the ASBR but as well as on the RRs.

Depends on the topology, BGP best external and BGP PIC would be just enough to send more than one path without BGP Add-path or other mechanisms

With BGP best external and BGP PIC, for certain topologies, you can have sub second convergence

BGP best external was already specified in the original BGP RFC but never implemented by the vendors but now it is popular again.

Cisco CCIE/CCDE Instructor - CCDE #2014:0017, CCIE #26567 (R&S), CCNP, CCDP, JNCIS, JNCIP
Orhan, an experienced 12-year+ IT architect, and "hands on" engineer, has a wide range of technical skill sets and expertise - with large-scale, worldwide designing being his most prominent. He's worked in various environment, including enterprise service provider networks, data center / virtualization environments, and in high-end security centric infrastructures.
He has been teaching Cisco network design concepts, CCDE and Pre-CCDE classes for the past two years. He contributed to Cisco Press CCDE Study Guide and official technical editor/reviewer for the Cisco CCDP Arch, Version 2 book.
In addition, Orhan runs a popular blog located at orhanergun.net. He's also a blogger and podcaster at Packet Pushers as well as an owner of the Google CCDE Group Study.