CiscoLive 2012 and the FCoE Groundswell

As the Product Manager for Fibre Channel over Ethernet (FCoE), I often get asked some of the hard questions about how the technology works. Sometimes I get asked the easy questions. Sometimes – like two nights ago – I get asked if the standards for FCoE are done.

I’m not kidding.

My own expectations for discussing FCoE were focused around the topics and conversations that we’ve been seeing over the last year, since the last Cisco Live in 2011.

Since that time, Cisco has made some tremendous progress in offering numerous FCoE-based solutions on several of our products, from Multihop FCoE for Nexus 5000, Nexus 7000 and MDS Fibre Channel switches, to improvements in the management tools (Data Center Network Manager) and expanding the number of possible options for using the technology.

Who’s Using FCoE?

I confess I was a bit concerned going into this year’s Cisco Live because I recognize that I have a skewed view of the world of convergence. I mean, each week (on average) I give about 3-5 presentations on converged networks, FCoE, multiprotocol storage, etc., to partners and customers. When you do that many for months on end, you tend to think that everyone is doing aware of what’s happening as you are.

The other side of the coin, though, could be that I’m not necessarily talking with a representative sample of customers. After all, the questions can often be what I consider “first step in understanding” questions. So maybe, possibly, I was just getting the early adopters in my world, not the bulk of the population.

Pleasant Surprises

Boy was I wrong. I had completely underestimated the desire to learn more about FCoE at CiscoLive.

In my first session – an 8-hour techtorial designed to fire-hose the audience with the most technical networking implementations – I asked the audience of about 60 people how many of them were getting pressure from their storage teams to prepare for handling converged storage over their Ethernet networks.

90% of my audience had already implemented or were preparing for converged networks.

There were two reasons why this was a huge deal for me.

1. They were asking the right questions. If you were concerned about the impact on your changes in your network that would affect storage, they asked all the right questions about what kind of unintended consequences there could be.

2. This meant that the “Layer 8″ issue was being addressed. People are trying to figure out how to handle the basic trust issues of guaranteeing bandwidth for various traffic types and with the appropriate prioritization. The questions asked throughout the day (not just to me, but to the other speakers) were all spot-on. They wanted to know about how multipathing configurations affect storage, and what kind of bandwidth do they need to calculate if they get sudden bursty traffic for, say, streaming or even Hadoop-like traffic.

Awesome!

In another session, the speaker asked how many of the 100+ attendees were already using FCoE in their Data Centers. More than 75% of the attendees raised their hands.

Perhaps it wasn’t exactly a scientific sample, but that percentage of people in a room of networking experts still stunned me.

But Does It Count?

Whenever someone says “nobody I know is using FCoE” I’m a bit surprised. There are over 13,000+ UCS customers who are using FCoE as the foundational basis for their storage system.

“But that doesn’t count, because it’s only UCS.”

I confess I’ve never understood this argument. There is no difference between the FCoE that runs on a UCS system and one that runs on, say, anyone else’s servers using FCoE. There’s no “FCoE-C” version or “FCoE-I” version, etc. It’s standards-based FCoE.

“Nobody’s using FCoE beyond the access layer.”

Let us suppose for the moment that this statement were true (it’s not, but let’s just suppose that it is). The only response I could come up with would be, “So what?”

Ultimately, you want to use a tool at the right time, and in the right place. When you have moments that you are looking to deal with access-layer issues, there are several compelling reasons why converged multiprotocol access makes sense, and I’m sure you’ve heard about them all: cable reduction, lower power & cooling, better use of underutilized links, etc.

When we look beyond the access layer, we have different challenges. We have issues about maintaining oversubscription ratios. We have issues about obtaining enough bandwidth between chassis.

Let me give you an example. A large Accounts Payable company in the United States were facing issues with oversubscriptions on their SANs. They were also having major issues getting space, power and cooling to handle additional traffic, which made planning for additional growth extremely difficult.

As I mentioned in my 3-for-2 Bandwidth Bonus blog the improved encoding mechanism with FCoE really helped a great deal without forcing anyone to go “end-to-end” FCoE.

In this case, the servers were all FC, and so they were able to connect their FC-based servers to a Unified Port-capable Nexus 5500 switch, and then use FCoE ISLs to an MDS 9500) which then connected to traditional Fibre Channel storage.

Yes, that’s right. It was a FC -> FCoE -> FC topology.

Why?

Because if they used 8gb links, they were only going to really be able to get 6.8 Gbps of actual throughput (after FC encoding). That meant that if they maxed out their port channeled connections, they’d only get less than 110 Gbps of actual inter-switch bandwidth:

Maximum throughput per ISL

Even if they had decided to use 16G links, current implementations restrict to only 8 links to a bundling:

Same maximum throughput as 8G

However, if we want to take all 16 links in a 10G FCoE ISL, you can get 50% more bandwidth using 50% fewer links:

50% More ISL Throughput

For this company, the big problem was maintaining the oversubscription ratios between switches, not necessarily the need for addressing the access layer:

More than just convergence – Performance!

However, of course, does this count as Multihop FCoE? Does it really matter?

The point is that the Data Center has another tool that can be put into the appropriate place in order to solve a particular problem. In this case, the customer was able to maintain oversubscription ratios for future storage growth, regardless of whether it was FC (native) or FCoE traffic. Moreover, by having Unified Ports on the Nexus 5500, they had the flexibility of being able to use any type of storage as required.

From the customer’s perspective, “this positions us for now and in the future. It doesn’t make sense not to move in this direction.”

Unorthodox? Perhaps. Unanticipated? Definitely. But it does speak to the point that FCoE can be a very powerful tool when used to solve well-understood problems inside of the data center.

The CiscoLive Effect

I gave several presentations this week with numerous partners, and talked to dozens of people who wanted to know the hows, whys, and whens of when they can do convergence on their networks. Overall, the level of sophistication of the questions have grown exponentially, and I’m absolutely convinced that by next year’s Cisco Live many of these partners and customers are going to be pushing the envelope in ways that we can only imagine right now.

For now, though, CiscoLive was a fantastic week for me, personally. It is indescribable to describe the feeling of watching all the hard work begin to pay off for customers.

Hi,
Interesting maths.
I'm not a storage person and you indicate that there is bandwidth loss due to FC overhead reducing throughput to 6.8 and 13.6 - no issue with this, but you do not apply the same logic to FCoE - surely there is FC plus ethernet encoding to occur, would this not reduce throughtput - plus it is shared medium. Just curious.
Thanks
Mark

Hi Mark,
Thanks for joining the conversation. If you take a look at the 3-for-2 Bandwidth Bonus link that I enclosed in the blog, you'll find out about how the encoding for 10G Ethernet (including FCoE) is far more efficient than 8G FC.
The issue about the fact that this is a shared link is actually a really good point, but winds up providing FCoE with an even greater advantage using a shared link.
Let's say we want to take a 50/50 split on a 10G link, where you're going to be getting a minimum of 5G of actual throughput. However, if there isn't a lot of Ethernet traffic, ETS allows you to use additional bandwidth as it's available. That means that in real-world use, you can get better bandwidth using FCoE on a shared link than on an 8G HBA which maxes out at 6.8.
Moreover, with the 8G HBA, if you're not pushing all of that FC traffic, you can't simply 'borrow' it for Ethernet traffic. In the long run, that's underutilized bandwidth.
Finally, "surely there is FC plus ethernet encoding." No, because encoding happens on the physical layer, not at the protocol layer. You don't run two different physical encoding mechanisms on the same physical layer.
Thanks again,
J

First, great article, lots of good information, and most of all great discussion around the Nexus 5000. What a versatile switch for so many great customer scenarios.
However, I do have a question regarding your discussion of oversubscription. I understand that FCoE Encoding is far more efficient than FC encoding, I also understand how using the FCoE ISLs over FC ISLs will provide higher throughput. Now, what happens though once we connect those switches to the FC Storage arrays? Don't we just re-introduce the same oversubscription we had in the previous FC-Only topology?
Thanks for the great article!

Hi Andy,
That's an excellent question, and I'm glad that you're thinking in terms of oversubscription. With FC networks we calculate out the oversubscription ratios between our hosts and our storage, which make it very important that we allocate similar bandwidth requirements that the storage manufacturers suggest.
In general (of course, mileage may vary), the oversubscription between switches is 1:1, and so when we architect our SAN design we look to maintain the fan-in ratio of hosts-to-storage without adding additional stress on the iSLs. In this case, as they had added VMs with storage connectivity the balance began to tip against them on the storage side, which is why using FCoE to the 9513s allowed them to scale to their existing storage better.
If you have a mixed environment where you are going to be, say, attaching 10G FCoE storage to MDS Fibre Channel directors with FC links back to the hosts, you raise an excellent point: make sure that you take any difference in actual throughput into account for those ratios.

J wouldn't it be more accurate if you kept the comparison to actual throughput (800MBps, 1050MBps, 1600MBps) vs. data rates with different encoding (8b/10b and 64b/66b)? The data rate for 8GFC is 8.5 Gbps with the 8b/10B encoding on an 8GFC link. It would obviously be less (your 6.8-7 Gbps) if you compared it to 8 Gbps on a 64b/66b link (10GbE or 16GFC).
Brocade (aka "another generic brand") is the only shipping solution for 16GFC. Your math for Brocade "bundling" is way off for 8 and 16GFC. Brocade "bundles" at both a frame-level (Frame-Based Trunking, Cisco can't do) and an exchange-level (Dynamic Path Selection, like Cisco port channels). Brocade's frame-based trunking is limited to 8 ports in a trunk. However, you can have up to 8 trunks balanced with DPS (i.e. 64 ports). So for 8GFC we can use DPS on trunks for up to 512 Gbps (or 435.2 Gbps using your math). For 16GFC it's 1024 Gbps (or 870.4 Gbps using your math). Regardless of how you calculate it, it's well beyond 160 Gbps. Brocade 16GFC ISLs will provide the same throughput in 4 fewer ports (12 vs. 16) (12x1600MBps vs. 16x1200MBps).

Scott,
You bring up a lot of points here and I'll try to address them indivudually.
Interface speed vs. Data Rate:
Interfaces are typically known by their interface speed. For example when you say 8G Fibre Channel, people think 8G is the data rate. Same for 10 Gigabit Etherent, people assume the data rate is 10G. The reality is that there are two factors here, the clocking/encoding and the data rate. The clocking is the rate at which data is put on the line. On top of that you have the encoding, which is how the data is put on the line. For example, 8G FC is encoded using 8/10b encoding. That means for every 8 bits of data, you actually put 10 bits of data on the line. 10GE and 16G FC use 64/66b encoding which means for every 64 bits of data you actually put 66 bits on the line. It's the combination of these two factors that give you the effective data rate, in other words, the actual user data going across the line. The extra bits for encoding are not user data and should not be used for comparing data rates.
So how does this translate to data rate for the different interfaces. Take the clock rate, multiply it by the encoding and that will give you the data rate. Now for reference, 8G FC is clocked at 8.5 Gbps, 10GE is clocked at 10.3125 Gbps and 16G FC is clocked at 14.025 Gbps. So that means that the data rates are:
8G FC = 8.5 * 8/10 = 6.8 Gbps data rate
10 GE = 10.3125 * 64/66 = 10 Gbps data rate
16G FC = 14.025 * 64/66 = 13.6 Gbps data rate
If you want these in MB/s, divide by 8 (bits to bytes):
8G FC = 6.8 Gbps / 8 = 850 MB/s
10GE = 10 Gbps / 8 = 1250 MB/s
16G FC = 13.6 Gbps / 8 = 1700 MB/s
As to Brocade's trunking / DPS and Cisco's port channels, what I think you're trying to point out is how many interswitch links can be bundled. As you said, Brocade can balance 8 trunks by 8 DPS paths, for a total of 64 ISLs. So for 8G, that would give you 64 x 850 MB/s or 54,400 MB/s between switches. For 16G FC, that would be 64 x 1700 MB/s or 108,800 MB/s.
However, Cisco can load balance up to 16 port channels of 16 members, so that gives you a theoretical limit of 16 x 16 or 256 ISLs between switches. So for 10GE, that's 256 x 1250 MB/s or 320,000 MB/s of bandwidth.
But let's be serious, who's going to put 64 or 128 or 256 ISLs between two switches? The real story goes back to the data rate of the ISLs and that 10GE FCoE is 47% more efficient per ISL than 8G FC and that 16G FC is 36% more efficient per ISL than 10GE FCoE.

Yes I confused some of interface speeds, line rate, and actual throughput numbers. Even your data rate numbers and actual throughput numbers are different (e.g. 850 MBps vs. 800MBps from the previous blog), but I get your point.
The only reason I brought up the trunking was that your charts and images implied there were some limitations with 8Gb and 16 Gb technology that limited the "bundles" to 16x8GFC and 8x16GFC. I just wanted to clarify that it's not a technology issue, it must be something on the customer's side.

Some of the individuals posting to this site, including the moderators, work for Cisco Systems. Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not of Cisco. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Cisco or any other party. This site is available to the public. No information you consider confidential should be posted to this site. By posting you agree to be solely responsible for the content of all information you contribute, link to, or otherwise upload to the Website and release Cisco from any liability related to your use of the Website. You also grant to Cisco a worldwide, perpetual, irrevocable, royalty-free and fully-paid, transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any original content you provide. The comments are moderated. Comments will appear as soon as they are approved by the moderator.