Inter-AS MPLS OptionB - MPLS Label Usage

When building MPLS Inter-AS Option B interconnects the number of local labels doubles on the ASRBs consuming LFIB usage quicker than expected. This means label exhaustion is easily reached, especially with a couple of link flaps occur for example. This is demoed below on ME3600s (22,000 label limit by default, usable labels are 16 to 21999) and ME3800s (30,000 label limit by default, usable labels are 16 to 29499).

Topology:

PE1
|
ASBR2/PE2 == PE3
|
ASBR1

When a PE (for example PE2) receives routes from an iBGP VPNv4 neighbour (for example PE1) the routes are received with an MPLS label value, this doesn’t use any local label space (LFIB) on PE2 unless PE2 is advertising those routes to another iBGP peer PE (such as PE3) or a CE. In this case PE2 needs to allocate a local label and will advertise those routes to PE3 with its local label value in the BGP update (assume using next-hop-self etc).

This is the same for VPNv4/6 and LDP learnt routes on PE2, it won’t use any of its 22,000 label limit unless it has to allocate a local label (such as when it advertises the routes to another PE).

In the case of eBGP VPNv4 (Inter-AS MPLS Option B) PE2 is receiving 100 routes from PE1 and uses 100 labels because it’s advertising the 100 routes to PE3 but now with PE2 uses another 100 labels because it generates different labels to send to the eBGP ASBR peer. So now it is using 200 labels for the 100 routes it has received from PE1.

Labels are not released for re-use (holddown timer) for 5 minutes after they are cleared. For example if the VPNv4 session to PE1 is dropped, PE2 will continue to label switch traffic it received from either PE3 or ASBR1 for a further 5 minutes until those labels time out from the LFIB.

In the case that PE2 has 12,000 routes and 12,000 local labels in use for those routes, if the BGP session to PE1 flaps (goes down and up again within the LFIB 5 minute timeout window) it will relearn those routes and allocate new labels and at this point try to use 24,000 labels (until the first 12,000 time out) and will run out of local label space.

One can check the label usage on an ME3600/ME3800 switch using the following:

! These are the messages in the syslog:
swi1.core#show logging | i mpls|label
Jan 20 13:30:58.653 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:30:58.653 UTC: label allocation failed for fib 172.16.250.112/30 Tbl:34 label val 1054
Jan 20 13:30:58.657 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:30:58.657 UTC: label allocation failed for fib 10.228.254.142/31 Tbl:19 label val 5732
Jan 20 13:31:14.285 UTC: nmpls_next_label_check: Label allocation Failed
Jan 20 13:31:14.285 UTC: label allocation failed for fib 172.16.242.32/29 Tbl:34 label val 4459
! Check the MPLS limits inside the NILE TCAM:
swi1.core#show platform aspdma template | i MPLS
NILE_NUM_EOMPLS_TUNNELS = 512
NILE_NUM_ROUTED_EOMPLS_TUNNELS = 128
NILE_NUM_MPLS_VPN = 128
NILE_NUM_MPLS_SERVICES = 512
NILE_NUM_MPLS_INGRESS_LABELS = 22000
NILE_NUM_MPLS_EGRESS_LABELS = 28500
MPLSD_TABLE = 34816
EMPLS3LD_TABLE = 28672 ! << MAX
! In the last line above, 28672 is max usable label count for L3 VPNs
! Below it can be sees that label usage is fluctuating each time the command is run, but 28670 labels used (in the 2nd output) is just 2 short of the maximum, so it is likely fluctuating up to max and down again as routes come and go:
swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230743 Total Free:11202080 Usage:28663
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17
swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230813 Total Free:11202143 Usage:28670
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17
swi1.core#show platform nile adjmgr all | i EMPLS
EMPLS3LD Total Alloc:11230830 Total Free:11202164 Usage:28666
EMPLSINTD Total Alloc:411 Total Free:394 Usage:17
! 20K out of 20,480 IPv4 unicast routes are used and this device is using per-prefix labeling for most if not all VRFs:
swi1.core#show platform tcam utilization ucastv4
Nile Tcam Utilization per Application & Region:
ES == Entry size == Number of 80 bit TCAM words
==================================================================
App/Region Start Num Avail ES Used Range Num Used
==================================================================
UCASTV4 0 20480 1
nile0 20000
nile1 20000
! The local label range of the switch is 16-22k so just under 22K usable local label values:
swi1.core#show mpls label range
Downstream Generic label region: Min/Max label: 16/21999
! 18K local labels are used:
swi1.core#show mpls forwarding-table summary
18029 total labels
! In the above command "show platform aspdma template | i MPLS" it can be seen that there is space for 28K labels and the command "show platform nile adjmgr all | i EMPLS" shows that 28k are in use not 18k as per "show mpls forw summ". Why the difference?
! MPLS Option B sessions double allocate labels. 18k labels used above is 18k local labels assigned by this PE (prefixes it is advertising on to to other MPLS enabled devices so it needs to assign a local label to create the end to end LSP). The 28k labels used is 18k local labels plus (28k-18k=10k) ~10k labels not advertised on (because this is the LER for an LSP for example). So the switch can hold slightly more labels that it can locally assign.

! Looking back at this command it can be seen that the switch can store 22000 ingress labels or 28500 egress labels, and it has a total shared ingress+egress storage space of 28,672 labels:

swi1.core#show platform aspdma template | i MPLSNILE_NUM_MPLS_INGRESS_LABELS = 22000NILE_NUM_MPLS_EGRESS_LABELS = 28500...MPLS3LD_TABLE = 28672
! Even though there are 480 unicast IPv4 routes left, this switch is out of label space, since per-prefix labelling is used here it is likely because of the extensive use of MPLS Opt B interconnects on this switch double allocating labels, so the labels are exhausted just before the prefix count in this example case.