Five Functional Facts about FabricPath

FabricPath is Cisco’s proprietary, TRILL-based technology for encapsulating Ethernet frames across a routed network. Its goal is to combine the best aspects of a Layer 2 network with the best aspects of a Layer 3 network.

Layer 2 plug and play characteristics

Layer 2 adjacency between devices

Layer 3 routing and path selection

Layer 3 scalability

Layer 3 fast convergence

Layer 3 Time To Live field to drop looping packets

Layer 3 failure domain isolation

An article on FabricPath could go into a lot of detail and be many pages long but I’m going to concentrate on five facts that I found particularly interesting as I’ve learned more about FabricPath.

#1 – FabricPath is not a network topology

When I first started learning about FabricPath, I believed that it came with a requirement that your network topology conform to certain rules. While I now know that is not true, there is a common topology that is discussed when talking about network fabrics. It’s called the spine+leaf topology.

This is similar to a traditional collapsed core design with a few differences.

When we’re talking about a fabric, all links in the network are forwarding. So unlike a traditional network that is running Spanning Tree Protocol, each switch has multiple active paths to every other switch.

Because all of the links are forwarding, there are real benefits to scaling the network horizontally. Consider if the example topology above only showed (2) spine switches instead of (3). That would give each leaf switch (2) active paths to reach other parts of the network. By adding a third spine switch, not only is the bandwidth scaled but so is the resiliency of the network. The network can lose any spine switch and only drop 1/3rd of its bandwidth. In a traditional network that runs Spanning Tree Protocol, there is no benefit to scaling horizontally like this because STP will only allow (1) link to be forwarding at a time. The investment in an extra switch, transceivers, cables, etc, is just sitting idle waiting for a failure before it can start forwarding packets.

So while the spine+leaf topology is commonly used when discussing FabricPath, it is not a requirement. In fact, even having full-mesh connectivity between spine and leaf nodes as shown in the drawing is not a requirement. You could connect each spine to every other leaf. You could connect spines to other spines or a leaf to a leaf.

According to Cisco, there is a lot of interest from customers about using FabricPath for connecting sites together (ie, as a data center interconnect or for connecting buildings in a campus). An example of that might be a ring topology that connects each of the sites.

The drawing shows FabricPath being used between the switches that connect to the fiber ring. This is obviously a very different topology than spine+leaf and yet perfectly reasonable as far as FabricPath is concerned.

FabricPath is a method for encapsulating Layer 2 traffic across the network. It does not define or require a specific network topology. The rule of thumb is: if the topology makes sense for regular old IP routing, then it makes sense for FabricPath.

#2 – FabricPath introduces its own unique data plane

In order to achieve the benefits that FabricPath brings over Classical Ethernet, some significant changes needed to be implemented in the data plane of the network. Among these changes include:

The introduction of a Time To Live field in the frame header which is decremented at each FabricPath hop

A unique addressing scheme consisting of a 12-bit switch ID which is used to switch frames through the fabric

A Reverse Path Forwarding check is done on each frame as it enters a FabricPath port (another loop prevention mechanism)

A new frame header format with these new fields

In order for the hardware platform to switch FabricPath frames without any slowdown, new ASICs are required in the network. On the Nexus 7000, these ASICs are present on the F series I/O modules. It’s important to understand that not only do the FabricPath core ports need to be on an F series module but so do the Classic Ethernet edge ports which carry traffic belonging to FabricPath VLANs. This last requirement may impact certain existing environments where downstream devices are connected on M1 or M2 I/O modules.

FabricPath is also supported on the Nexus 5500 running NX-OS 5.1(3)N1(1) or higher. Cisco’s documentation isn’t exactly clear how FabricPath is implemented on the 5500 series but I’ve been told 55xx boxes do it in hardware (the original 50xx boxes do not support FabricPath).

#3 – FabricPath does not unconditionally learn every MAC in the network

One of the key issues with scaling modern data centers is that the number of MAC addresses each switch needs to learn is growing all the time. The explosion in growth is due mostly to the increase in virtualization. Consider a top-of-rack, 48-port Classical Ethernet switch that connects to 48 servers. That’s 48 MAC addresses that this switch and all the other switches in the network need to learn to send frames to those servers. Now consider that those 48 servers are really VMware vSphere hosts and that each host has 20 virtual machines (an average number, probably low for some environments). That’s 960 MAC addresses. Quite an increase. Now multiply that out by however many additional ToR switches are also servicing vSphere hosts. All of a sudden your switches’ TCAM doesn’t look so big any more.

Since FabricPath continues the Layer 2 adjacency that Classical Ethernet has, it must also rely on MAC address learning to make forwarding decisions. The difference, however, is that FabricPath does not unconditionally learn the MAC addresses it sees on the wire. Instead it does “conversational learning” which means that for MACs that are reachable through the fabric, a FabricPath switch will only learn that MAC if it’s actively conversing with a MAC that is already present in the MAC forwarding table.

Consider Switch 2 in this example. Host A is reachable through the fabric while B and C are reachable via Classic Ethernet ports. The MACs of B and C are learned on Switch 2 using Classic Ethernet rules which is to say that they are learned as soon as they each send frames into the network. The MAC for A is only learned at Switch 2 if A is sending a unicast packet to B or C and their MAC is already in Switch 2’s forwarding table. If A sends a broadcast frame into the network (such as when A is sending an ARP ‘who-has’ request looking for B’s MAC), Switch 2 will not learn A’s MAC (because the frame from A was not addressed to B, it was a broadcast). Also if A sends a unicast frame for Host D, a host that Switch 2 knows nothing about, Switch 2 will not learn A’s MAC (destination MAC must be in the forwarding table to learn the source MAC).

The conversational learning mechanism ensures that switches only learn relevant MACs and not every MAC in the entire domain thus easing the pressure on the finite amount of TCAM in the switch

#4 – FabricPath ports do not have IP addresses

One area where FabricPath gets confusing is when it’s referred to as “routing MAC addresses” or “Layer 2 over Layer 3”. It’s easy to hear terms like “routing” and “Layer 3” and associate that with the most common Layer 3 protocol on the planet — IP — and assume that IP must play a role in the FabricPath data plane. However, as outlined in #2 above, FabricPath employs its own unique data plane and has been engineered to take on the best characteristics of Ethernet at Layer 2 and IP at Layer 3 without actually using either of those protocols. Below is a capture of a FabricPath frame showing that neither Ethernet nor IP are in play.

Instead of using IP addresses, an address — called the “switch ID” — is automatically assigned to every switch on the fabric. This ID is used as the source and destination address for FabricPath frames destined to and sourced from the switch. Other fields such as the TTL can also be seen in the capture.

#5 – FabricPath employs Equal Cost Multipath packet forwarding

In Classic Ethernet networks that utilize Spanning Tree Protocol, it’s no secret that the bandwidth that’s been cabled up in the network is not used efficiently. STP’s only purpose in life is to make sure that redundant links in the network are not used during steady-state operation. That’s a poor ROI on the cost to put in those links and from a scaling/capacity perspective, it’s equally as poor since the network is limited to whatever the capacity is of that one link and cannot employ multiple parallel links. (Ok, you technically can using etherchannel but you understand the point I’m trying to make)

Since FabricPath doesn’t use STP in the fabric and because the fabric ports are routed interfaces and therefore have loop prevention mechanisms built-in, all of the fabric interfaces will be in a forwarding state capable of sending and receiving packets. Since all interfaces are forwarding it’s possible that there are equal cost paths to a particular destination switch ID. FabricPath switches can employ Equal Cost Multipathing (ECMP) to utilize all equal cost paths.

Here S100 has (3) equal cost paths to S300: A path to each of S10, S20, and S30 via the orange links and then from each of those switches to S300 via the purple links.

Much like a regular etherchannel or a CEF multipathing situation, FabricPath ECMP utilizes a hashing algorithm to determine which link a particular traffic flow should be put on. By default the inputs to the hash are:

Source and destination Layer 3 address

Source and destination Layer 4 ports (if present)

802.1Q VLAN tag

These values are all taken from the original, encapsulated Ethernet frame.

An interesting value-add that FabricPath does is to use the switch’s own MAC address as a key for shifting the hashed bits. This shifting prevents polarization of the traffic as it passes through the fabric (ie, prevents every switch from choosing “link #1” all the way through the network due to their hash outputs all being exactly the same). The benefit of this is only realized if there’s more than (2) hops between source and destination FabricPath switch.

So there you have it. Are you currently using or planning a FabricPath deployment? Please share your thoughts in the comments below.

Nice summary. Can you think of any use cases for running both LAG and ECMP concurrently between a leaf and spine pair (other than the obvious of high capacity)? I’m wondering if there are operational benefits, allowing you to seamlessly move a physical port between LAGs and FP uplinks to achieve better load balancing?

Would I be able to place a regular switch in the FabricPath core and it be able to transparently forward the frames? Looking at the various information on the net, Cisco claim that the frame is a regular mac-in-mac encapsulation so it shouldn’t be a problem. I can see however the header uses different fields in DA, DA and .1q such as FTAG and TTL. As the TTL will always be changing and I don’t know what the FTAG is, I guess I will need to use Q-in-Q to properly forward the frames or configure and trunk all VLANs. How will MAC learning be affected, would this create a problem?

This is really just academic, I’m interested to know whether I could combine regular switches in the FP core and it seamlessly transport the traffic without the FP switches realizing.

Keith, great question. I spent some time this morning pondering and this is what I’ve come up with. I think there are two things to keep in mind here.

1. FP expects that an FP-enabled port connects in a point-to-point fashion with its neighbor port. On the wire, an FP frame has a source address of the ingress FP switch and a destination address of the egress FP switch. The SA and DA do not change as the frame moves through the fabric. So under this premise, when an FP frame arrives on an FP interface, the switch knows it must act on it because it’s the only possible receiver of that frame on that network segment (it acts, despite its own switch ID potentially being absent from the SA/DA fields). Having multiple FP interfaces on the same Layer 2 segment would break this logic.

2. Although the FP frame header matches up with the 802.3 header, the SA and DA fields do not act like Layer 2 MAC addresses. The SA and DA fields contain switch IDs which is more akin to a loopback IP than a MAC address. I can see a scenario where the regular switch that’s in between two FP ports would continually learn the source “MAC” address on multiple ports (MAC flapping) and where it might not have learned the dest “MAC” and would constantly flood the FP frames. I suppose if you had exactly two FP ports plugged into this switch then neither of these would be an issue. On top of that, there’s also the ethertype field which in an FP frame is set to 0x8903. The switch in the middle would have to accept frames with this foreign ethertype value.

On a side note, the FP header fields do line up perfectly with the 802.3 Ethernet header but not so much with the 802.1q header so I believe a regular ethernet switch would be able to interpret the frame just fine (it would be a baby giant though because of the FP header).

Since FabricPath continues the Layer 2 adjacency that Classical Ethernet has, it must also rely on MAC address learning to make forwarding decisions. The difference, however, is that FabricPath does not unconditionally learn the MAC addresses it sees on the wire. Instead it does “conversational learning” which means that for MACs that are reachable through the fabric, a FabricPath switch will only learn that MAC if it’s actively conversing with a MAC that is already present in the MAC forwarding table.

I don’t understand this part. How come Fabric Switch will not learn MAC unless MAC is in the forwarding table? Why Fabric switch need to learn MAC already in its forwarding ?

Imagine a Fabric Path (FP) switch that receives a frame on an FP interface. The switch needs to decide whether to learn the source address (SA) of that frame or not. What it does is look up the destination address (DA) in its forwarding table. If the DA already exists in the forwarding table, the FP switch will store the SA in the table.

Now imagine an FP switch receives a frame on an FP interface with a DA of all ones (ie, broadcast). This frame is not part of a two-way conversation — it’s a one-way flow of data from the source to “all”. The FP switch will not learn the SA when it receives this frame because the DA — the broadcast address — is not present in its forwarding table.

An interesting value-add that FabricPath does is to use the switch’s own MAC address as a key for shifting the hashed bits. This shifting prevents polarization of the traffic as it passes through the fabric (ie, prevents every switch from choosing “link #1″ all the way through the network due to their hash outputs all being exactly the same). The benefit of this is only realized if there’s more than (2) hops between source and destination FabricPath switch.

I don’t understand this part? How come using the Fabric switch own MAC address will result in different Hashing algorithm result and therefore different Link is chosen ?

Polarization happens when all the switches (or routers) in a network use the same inputs to the hash algorithm and therefore always end up with the same hash result. You can end up with your switches/routers all choosing the “left-hand” link (as an example) between a given source and destination. That makes the traffic flow sticky to all of the “left-hand” links through the network. In FabricPath, when a switch includes its own MAC address as part of the input to the hash, it’s injecting some randomness which should ensure that each switch along the way will chose a different path — some will choose “left”, some will choose “right”.

This document clearly explains polarization in the context of Cisco Express Forwarding. It’s equally applicable to FabricPath (the description is, not the solution).

I think this is a great blog but you may confuse people by stating that Fabric Path is a way of encapsulating Ethernet over a routed network. I am no expert in DC technologies but you don’t even need a routed network for Fabric Path. It is strictly a Control Plane replacement for STP by doing MAC-in-Mac encapsulation. OTV is a way of ecncapsualting Ethernet over a router network.

Part of the confusion might be the association of “routed network” with “IP network”. In the case of FP, there is no underlying IP transport inside the FP fabric. However that doesn’t mean it’s not a routed network. It’s routed in the sense that each FP node builds a forwarding table, that table is built using a control plane protocol (IS-IS) (unlike switching which does data plane learning), and the table is used to deterministically forward packets towards the destination. The format for these routed datagrams are defined by the FP protocol.

As far as it being strictly a control plane, it is more than that. As I just described above and also as point #2 explains, FP is definitely present in the data plane as well. The only major element in the control plane is the use of IS-IS to learn the topology of the fabric itself. The learning of end device addresses is all data plane driven.

So if you compare OTV and FP, OTV uses IP to provide the transport (the “fabric”, if you will) and just rides on top of that while FP mandates its own transport and packet format to create the underlay network.