6500/7600 Forwarding Debug

In this example MPLS is being run to the CPE (global routing table: 10.5.4.82) using an eBGP VPNv4 session and pinging from a PE router other than the one the customer is connected to inside one MPLS L3 VPN (a specific VRF, CUST1) is failing. On the PE the customer is connected to everything is fine (labelled traffic is flowing as expected). So this is packet loss in one single VRF/VPN from all PEs except the customer connected one. Traffic inside all other VRFs/VPNs are fine:

One the customer connecting PE (abr1) we can see all the routing information is present and correct, the CPE has advertised the route 10.254.253.70/32 inside the VRF CUST1, a label is advertised with that route and both the route and label are in both FIB and LFIB, and MLS and CEF adjacencies (which should then be programmed down into hardware by the PFC) are present and valid:

Everything looks fine from this PE (abr1) as we can even ping the CPE inside the CUST1 VRF. But from no other PE can the 10.254.253.70 IP inside CUST1 VRF be pinged. Even though all other PEs have the same info, a valid route (via this PE, abr1, loopback0 due to next-hop-self on the iBGP VPNv4 peerings), a valid label, and traffic inside all other VRFs towards this CPE is fine from all PEs.

At this point one needs to investigate a hardware programming issue. Traffic will be coming in label switched from other PEs. abr1 is originating label 91953 to other PEs. For traffic originated from other PEs the traffic must come to abr1 with label 91953 which is then swapped for 150 (as above) and forwarded on to the MPLS enabeled CPE. For traffic originating from abr1, label 150 is being push on, there is no pop-and-push (swap), so perhaps something about incoming traffic labelled with 91953 is wrong:

Everything seems fine in software, but in hardware packets from other PEs are being sent to a bit bucket destination interface index (0x7FFF). In the end, hard clearing the BGP session to the CPE fixed the issue. This caused the route to be dropped and re-learnt, a new label was associated and so a hardware update was triggered to push the new forwarding and adjacency information into the hardware. A new ping from another PE and ELAM saw that everything was working once again as expected again.

For packets being dropped we can actually update those values to point to the interface index of another interface and run a packet capture on that interface:

! If one wanted to redirect packets out of interface Te3/2, we must find its index,
! one can do this by looking at the index of the internal VLAN for that interface
abr1#show vlan internal usage | i 3/2
4027 TenGigabitEthernet3/2
abr1#remote command switch test mcast ltl-info vlan 4027
routed interface
src index 0x81 contain ports 3/2
multicast flood index 0xCFBB for vlan 4027 contain ports 3/2, 6/R
! 0x81 is the Te3/2 index. If one knows the interface index already we can double check with the following
abr1-thnlon.core#remote command switch test mcast ltl-info index 81
index 0x81 contain ports 3/2
! Now the register pointing to a drop (0x7FFF) or HWRL index can be updated to point to the real interface (RED_MPLS_ERR_IDX from above)
remote command switch show platform hardware tycho poke 45c 81