DCI: Using FabricPath for Interconnecting Data Centers

Here’s a topic that comes up more and more now that FabricPath is getting more exposure and people are getting more familiar with the technology: Can FabricPath be used to interconnecting data centers?

FabricPath has some characteristics that make it appealing for DCI. Namely, it extends Layer 2 domains while maintaining Layer 3 – ie, routing – semantics. End host MAC addresses are learned via a control plane, FP frames contain a Time To Live (TTL) field which purge looping packets from the network, and there are no such thing as blocked links – all links are forwarding and Equal Cost Multi-Pathing (ECMP) is used within the fabric. In addition, since FabricPath does not mandate a particular physical network topology, it can be used in spine/leaf architectures within the data center or point-to-point connections between data centers.

Sounds great. Now what are the caveats?

Layer 1 Dependency

As my previous article on FabricPath explains, FabricPath is not an overlay. It has its own unique data plane that operates right on the wire. It does not ride within Ethernet frames and does not ride above IP. It has its own Layer 2 frame format that must be put right onto the wire. Additionally, the control plane protocol – a modified version of Intermediate System to Intermediate System (ISIS) – requires that FP nodes be connected using point-to-point links (no intermediate devices).

Given this requirement, certain WAN technologies preclude the use of FabricPath. For example, transparent LAN services (TLS), Layer 2 extensions, and VPLS are all unsuitable. Layer 1 technologies such as DWDM or dark fiber are ideal and even managed Ethernet over MPLS (EoMPLS) services will work.

Finally, keep in mind that FabricPath adds an additional 16 bytes onto every frame sent through the fabric so the MTU on your DCI links need to accomodate frames of at least 1516 bytes.

First Hop Routing Protocol Localization

I first introduced the concept of FHRP localization in my post about Overlay Transport Virtualization. In a nutshell, FHRP localization optimizes traffic that is sourced from a server in the data center and is destined for something that lives outside the data center. FHRP localization provides the ability for an active HSRP/VRRP gateway in each data center so that the server’s traffic doesn’t have to cross the DCI link to find its gateway.

FHRP Localization (Active/Active)

In the OTV article, I talk about how FHRP localization is part of OTV. As of this writing it’s not automated — you have to create port-based ACLs to filter the FHRP hello packets on the DCI ports, but active/active gateways can be achieved (a CLI command to automate the PACLs is roadmapped).

The short explanation when it comes to FabricPath: today, there is no FHRP localization capability.

The longer answer is that this is a roadmap item. Cisco documentation calls this feature “Anycast FHRP” which indicates that it’ll be possible to configure multiple switches within the fabric to act as gateways, all of them will be active, and ECMP will be used to spread flows amongst all of them.

Until Anycast FHRP arrives, there is a workaround and that is to configure the same HSRP group ID on both pairs of HSRP switches (thereby giving each pair the same VIP and VMAC) but configure a different authentication string. This will prevent the pair in Data Center Left from becoming HSRP peers with the pair in Data Center Right. When you do this, however, the pairs will still detect that their VIP is in use (by the other pair). This detection can be disabled in NX-OS with the “no ip arp gratuitous hsrp enable” command.

The reason FHRP localization is important from a DCI perspective is because, just like the OTV diagram above, it’s inefficient to have inter-VLAN traffic traverse the DCI link to hit the server’s gateway. It might be fine if the destination is a server that lives in Data Center Left, but what if that server is in DC Right? or is a client machine reachable via the WAN? It’s more efficient to have the gateway local to the server routing the traffic so an intelligent forwarding decision can be made.

Multidestination Traffic

Multidestination traffic refers to packets that are broadcast, unknown unicast, or multicast (so-called “BUM” traffic). Since FabricPath follows the rules of routing and not bridging, BUM traffic is not flooded on the network. Instead, it follows what’s called a Multidestination Tree (MDT). A FabricPath MDT works very much like a traditional multicast tree where a root switch is elected/configured and loop-free branches out from the root are calculated.

When BUM packets enter the fabric, they are sent along the tree. This is a very important part to understand: an FP switch that receives, say, a broadcast frame from an end station must send this frame on the tree which means that ultimately, that frame has to traverse the root switch in order to reach the whole network. Placement of the root switch becomes key in a DCI scenario. In an intra-DC scenario, it’s less of an issue because the network topology is likely very symmetric — it’s easier to place the root more in the “middle” of the network. In a DCI scenario, the root can only be at one site which means the other site(s) need to traverse the DCI for all BUM traffic.

But wait, it can actually be worse. It’s possible that switches in site(s) without a local root might have to forward all BUM traffic across the DCI link even for VLANs/hosts/receivers located in the same siteas the ingress FabricPath switch. This happens when a Virtual Port Channel+ (vPC+) is configured between the FabricPath edge switches and a non-FP device (a classic ethernet switch, server, etc). With vPC+, the links between the FP switches and the non-FP device are both forwarding. So how do we prevent BUM traffic from ingressing on FP switch #1, being sent around through the fabric, hitting FP switch #2 and being sent back to the end station that originated the packet? vPC+ Designated Forwarder (DF). In a vPC+ pair, one of the switches is elected/configured as the DF. The DF is the only switch that will forward BUM traffic southbound on the vPC+ ports towards end stations.

So here’s the scenario. BUM traffic enters the fabric (1). It is NOT the DF for the vPC where H2 is connected. The only place it can send this traffic is along the multidestination tree (2). In this scenario the BUM traffic leaves the data center (3), hits the MDT root (4), and then follows the tree back into the data center (5), hits the vPC+ member switch that is the DF for the vPC towards H2 (6) which then forwards it towards H2 (7).

Because of the importance of the MDT root, its placement in the network should be considered carefully. This is especially true if there is a lot of multicast traffic within the data centers. Failure scenarios need to be analyzed as does bandwidth capacity feeding the root switch.

Closing Thoughts

The three points above are some of the considerations for using FabricPath for DCI. The point about MDT root placement is probably the biggest one. As was told to me, a failure in the site that contains the root switch can ripple outward and affect traffic flows in other sites. This breaks the principle of failure domain isolation. On the other hand, FP provides Layer 3 semantics, fast convergence, and very high scalability which are all desirable qualities for DCI.

There certainly is a use case for using FabricPath as a DCI technology. People are doing this today. Is it better than another technology? It’s all relative to the specific environment. As should be done with any technology: understand the requirements, understand the constraints and boundaries that need to be respected in the environment, and then line up the various technology options against these things. Use the right tool for the job.

Check out the other articles in my series on DCI and please feel free to leave a comment or question below.

Disclaimer:
The opinions and information expressed in this blog article are my own and not necessarily those of Cisco Systems.

15 thoughts on “DCI: Using FabricPath for Interconnecting Data Centers”

Thanks Joel for a very nice, easy to read writing.
While FHRP Localization has now been cleared with HSRP Anycast feature for FabricPath there are still things like lack of support for Proxy ARP (at DC edge), unknown unicast flooding control mechanizm to name a couple. Taking those into consideration – it’s not any better than having vPC or plain EtherChannel bundled DCI links.

The one big difference (to me) with FP vs etherchannel is that FP will (natively) partition the Spanning Tree domain. It also removes the need for STP on the DCI links. With that, we can eliminate STP topology change notifications moving between the sites and the subsequent MAC table flushing.

Another difference that comes to mind is that FP doesn’t unconditionally learn MAC addresses. Switches in Site 1 don’t have to learn the MAC addresses of devices in Site 2, if those devices are not talking to anything in Site 1. This frees up TCAM space in the switches in Site 1 and can be a good thing in really dense environments.

I think there are definitely differences between using FP and EC for DCI. Your comment did get me thinking quite a bit though and re-evaluating whether the gap is really all that big in reality.

Hi Joel, thanks a lot for your blog posts.
this one particularly triggered a doubt regarding the workaround for FHRP localisation and FabricPath.
Is it supported by cisco? is there any reference for it?
Thanks

Great post! You mentioned above the fabricpath will work over eompls. is there any special config that needs to be done for this to work? It was my understanding that the ethertype for FP is different and therefore needed a dark fibre or dwdm type of service.

Hi Robert. I’m quite certain that the 9300s don’t support Fabric Path. I’m not even sure if the hardware is capable of doing FP forwarding. Make sure you check that out before getting too far down the road of using FP.

Thanks for the great article, certainly brought up some concerns I haven’t thought about.
I’m currently planning an addition of a third DC to two existing ones in my topology. They’re currently connected via vPC, but the addition of the third one will definitely force me into some sort of more complex DCI solution.
My current dilemma is between OTV and FP. On one hand – OTV was designed specifically for that feature. But it has some major drawbacks for my scenario. For starters I’m running on 7K’s with F2 cards, OTV will force me to add at least a couple of M cards on each site. The crazy encap (comparing to FP ) makes me think troubleshooting this baby won’t be a piece of cake.
On the other hand – FP is has a lot of question marks also. Cisco Live presentation I watched the other day said that FP as DCI wasn’t tested on links longer than 25km. I’m looking at around 60-70km. Also, it said that FP is not recommended for DCI of more than 2 DC’s, without explanation. Will I be able to do local FHRP on all 3 sites?

I’m digging the simplicity of FP, but as hard as I rely on my own reasoning above anybody else’s, it take some balls to go against vendor’s recommendations.

I think the reason that FP isn’t recommended for DCI of > 2 sites is because when you grade FP on its ability to meet the requirements that we desire from a DCI technology, it doesn’t score as well as other technologies. And some of these shortcomings are made worse when there’s more than just a point-to-point interconnect between two sites.

Have you read Yves Louis’ blog on DCI? Yves is an architect at Cisco (and if you’re going to Cisco Live again, I highly recommend you book a Meet the Engineer session with him and spend some time with him and a whiteboard) and one of Cisco’s lead DCI experts.

One of Yves’ posts in particular talks about the desirable features in a DCI technology: http://yves-louis.com/DCI/?p=648 (this post mostly talks about VXLAN, and I’m not trying to muddy the waters for you. It just happens this article also outlines the desirable DCI features).

I recommend you give Yves’ blog post a read and then get a hold of your Cisco SE and figure out what’s right for you. You can even ask to do a virtual whiteboard session with Yves or one of his peers.

As far as FP having distance limitations, I’m not aware of any at the technical level. However I’m also not aware of any “wide area” DCI builds with FP, and that might be the root of the 25km comment. The ones I’m familiar with are within a metro area.

On the point about OTV’s levels of encap, I don’t believe that would make troubleshooting any harder and I’ve not heard that feedback from my customers. At the end of the day it’s an IP packet so you’re troubleshooting IP. The gobbledygook above the IP layer really only matters to the OTV speakers. And if that’s really a barrier to you, with the right hardware, you can do straight UDP encap.

Thank for the reply.
I gave Yves’ blog a read. He made a great job of generalizing the requirements for DCI for any case. I however run a specific network and don’t necessarily have all of these concerns to worry about. For example, I don’t need to think about non-DWDM links, since I’m using DWDM only with the optical equipment on-prem, and there’s no way it’ll change in the foreseeable future. I also don’t run any multicast traffic between sites, and don’t have too much of BUM traffic in general.

I understand your argument regarding the encap, but I don’t necessarily agree with it. I worked in network QA before and I’ve seen what a huge mess of code is hiding behind the pretty CLI. No such thing as bugless code, especially when it comes to Cisco. So for me running a simple control plane protocol means running less code, which translates into less chance for software bugs. The fact that the CLI hides all the complexity for me doesn’t mean I don’t have to understand what’s going on under the hood.

My SE is pushing me towards a solution he knows, and being and FP wasn’t deployed at all in the country I’m currently in (Israel) – it’s not an option as far as he’s concerned.

I don’t think your SE is unique in his guidance. Cisco’s lead DCI technology — or even its second or third — is not FP. It can and has been done (as Robert commented, below), but it’s not the preferred method because, as I said, it doesn’t score as well as the other options available.

As far as the comment about software quality, I don’t really want to go there :-) I can think of more than one counter point but I don’t think it helps you to debate any of that.

A general piece of advice: the architects at Cisco that come up with DCI solutions work on networks that are smaller, larger, more complex, and less complex than yours. Their experience is broader than any one environment. Don’t be too quick to discount that experience and their guidance.

Alex,
We have been running our FP environment for … 4 months now.
No functional problems at all. We have had some delays in getting all of the legacy connections into FP.

You can do local FHRP on all three sites – I do it with 40+ vlans via Anycast HSRP.

The equipment we picked – PacketLight – for our DWDM wan transport can go to 80Km and since the underlying concept of conversational learning populates the “who is where” tables – you can see this when you ping cross data centers, it is kind of cool – we decided that in our scenario we were quite safe.

I went to CiscoLive when the 7Ks were being introduced, bought them, wanted some ASRs for OTV too … but boy … FP is soooo much easier in my case that I hope to have 2 7Ks pulled out in 2 months.