Migration Strategy: Moving From MPLS/LDP to Segment Routing

21032019

MPLS core networks that use Label Distribution Protocol (LDP) are common in SP core networks and have served us well. So, the thought of pulling the guts out of the core is pretty daunting and invites the question why you would want to perform open-heart surgery on such critical infrastructure. This article attempts to explain the benefits that would accrue from such a move and gives a high-level view of a migration strategy.

Why Do I Need Segment Routing?

Simplicity: LDP was invented as a label distribution protocol for MPLS because nobody wanted to go back to the standards bodies to re-invent OSPF or IS-IS so that they could carry labels. A pragmatic decision, but one that results in networks having to run two protocols. Two protocols means twice the complexity. Segment Routing simplifies things by allowing you to turn off LDP. Instead it carries label (or Segment ID) information in extensions to the IGP. This then leaves you with only IS-IS or OSPF to troubleshoot. As Da Vinci reportedly said, ‘simplicity is the ultimate sophistication’.

Scale: LDP scales, but for fast convergence RSVP-TE is often used to tunnel LDP across a core. RSVP requires core routers to hold state for potentially many thousands of Label Switched Paths (LSPs), and as the number of these tunnels scales up, the speed at which convergence around failures can be achieved decreases. Segment Routing requires no state to be held in the core routers – stacks of segment IDs (SIDs) are encoded into the packets at the edges of the network, enabling control of traffic flow through the whole infrastructure without complex signalling protocols like RSVP and without having to hold state in the core routers.

Multipathing: LDP uses the shortest path as calculated by the IGP. RSVP is circuit-oriented in nature, in that a single path is signalled end-to-end. Packets follow that path much as they would follow an ATM or frame-relay virtual circuit (am I showing my age here?). With Segment Routing, multiple paths to the destination can be used, enabling a provider to scale-out – for example, by adding multiple smaller routers using 10G instead of being forced to upgrade to single large routers with expensive 100G interfaces.

Speed: Fast recovery around a fault in the LDP world relies on Loop Free Alternates (LFA), where a backup next hop is pre-installed in a router’s table in anticipation of a failure. When the failure happens, traffic switches immediately onto the alternate path while the IGP converges. There are two distinct disadvantages to this. Firstly, the backup path may not be what the IGP ends up selecting, so convergence has to happen twice. Secondly, LFA can cause micro-loops temporarily: upon detection of a failure of the primary path, packets are diverted to the backup next-hop but are immediately sent back to the sender because the IGP has not yet converged. Segment Routing has the concept of Topology Independent Loop Free Alternates (TI-LFA), which solves both of these issues. No micro-loops, and no double-convergence – all within 50 milliseconds. Moreover, it can be used to protect IP and LDP traffic as well as SR. What’s not to like about that?

There are quite a few other benefits such as anycast SIDs, Flex-Algo, but in the interests of keeping this brief I will leave those for another day, or you can read about them on http://segment-routing.net.

Migration Strategy

Let’s assume a relatively straightforward network topology to keep this illustration simple. In the diagram below we have PE routers at the edge, and P-routers in the core. The core doesn’t run BGP, and a single instance of IS-IS runs across the whole provider network. Multi-protocol BGP (MP-BGP) is used between PE routers to signal VPN membership. Finally, LDP is used to distribute labels throughout the network – these labels are applied to the packets for transport.

LDP end-to-end

We will play it safe and go with a gradual migration strategy. Our core will be migrated first, leaving islands of LDP around the edge:

Islands of LDP around a Segment Routing Core

Finally, we will enable SR at the edge, and remove LDP from the picture altogether.

SR end-to-end

1 – Enable Segment Routing on Core

Firstly, we need to set a few things up on all our routers:

Configure the Segment Routing Global Block (SRGB) – must be the same on all routers

Configure the Node Segment ID (Node SID) – must be unique to each router (or bad things will happen)

This kind of work is ideally suited to automation: an Ansible playbook could be written using a Jinja2 template to push the required configuration to all the core routers. Making the changes this way would ensure consistency where the config needs to be uniform across routers, and uniqueness where the config needs to be different.

In IOS, the required configuration for this would be as follows. This router would get a node-SID of 20001:

Once this is done, the IGP on the P-routers will advertise SIDs, but Segment Routing won’t be in use yet. LDP still exists everywhere remember, and in both IOS and Junos LDP routes have a higher preference than IS-IS routes.

So now, your configuration can now be verified at your leisure. Check the routing table and look for SIDs in the IS-IS routes to make sure everything is working as expected.

2 – Enable TI-LFA

Next, we need to enable TI-LFA on the P-routers. This will provide protection for all traffic types running across the core and protects that traffic against both link and node failure. Naturally, you would probably have Bi-Directional Forwarding Detection – BFD configured on all your core links too so that failure detection is rapid, but that is outside the scope of this document.

3 – Enable LDP and SR Interworking

At the moment, we have SR on the P-routers, and LDP is still running network-wide. When we turn off LDP on the P-routers, we will create islands of LDP around the edge of the network. We need a way for these islands to communicate until the time when we have SR running everywhere.

To do this we enable something called a Segment Routing Mapping Server. There are two components and you need both:

Segment Routing Mapping Server (SRMS)– Allows SR-only routers to reach LDP-only routers. The SRMS generates node-SIDs on behalf of the LDP-only routers, and advertises them into the SR domain. A different router with a leg in both SR and LDP then has to be configured to stitch the SR and LDP labels together. You would probably have two SRMS to ensure resilience in the event of failure.

SR mapping client– Allows LDP-only routers to reach SR-only routers. The client is a router with a leg in both SR and LDP domains which allocates LDP labels to all SR node-SIDs it knows about and advertise them into the LDP domain. Again, a pair of these is recommended at each SR/LDP boundary.

4 – Remove Protocols

Once the inter-working tasks are complete, LDP can be removed from the P-routers on a link-by-link basis. Working one link at a time, remove LDP from both ends – of course the routers on each end must have SR enabled on them before you do this.

By the end of this, we should have achieved our “SR core and LDP islands” topology.

Pushing SR to the Edge

Of course, not all edge devices are going to support SR at this stage, so there may well be a case where an island of LDP needs to continue to exist. However, our goal is to remove complexity, and SR has been around for a while so let’s assume our edge supports it.

Pushing SR out to the edge is a two-step process, and is similar to that used when migrating between IGPs:

1 – Enabling SR at the Edge

The same process can be followed for the PE routers as was used on the P-routers. Once node-SIDs are configured and the SRGB has been defined, we should start seeing IS-IS prefixes with segment routing information in them in the PE-routers’ tables. At this stage, Junos prefers the LDP prefixes, and if you are using IOS-XR the situation is the same. So, verification can be performed that end-to-end SR reachability is achievable, while the transport of packets remains with LDP.

2 – Turning On SR at the Edge

Two alternatives exist here. The first is simply to disable LDP on a link-by-link basis. The LDP entries disappear from the tables and are replaced by SR ones.

Instead of doing this, it may be preferable to migrate an entire PE router at once. In this case, IOS-XR offers the ‘sr-prefer’ command. This is an easy way to change the Administrative Distance (AD) of the prefixes. In Junos, you simply change the preferenceof the LDP protocol.

Once validation is complete, and assuming no islands of LDP remain, the protocol and the SRMS configuration can be removed from the network. Bada-bing! You’ve got yourself a Segment Routing network from end to end.