Categories

Meta

Update on the 8Gb FC vs. 10Gb FCoE Discussion

By far, the most popular post on this blog has been my discussion on the various protocol efficiencies between native 8 Gb/s Fibre Channel and 10 Gb/s Ethernet using Fibre Channel over Ethernet encapsulation. I wrote the original post as much as an exercise in the logic as it was an attempt to educate. I find that if I can’t explain a subject well, I don’t yet understand it. Well, as it’s been pointed out in the comments of that post, there were some things that I missed or just had simply wrong. That’s cool – so let’s take another stab at this. While I may have been wrong on a few points, the original premise still stands – on a per Gb/s basis, 10Gb FCoE is still more efficient than 8Gb FC. In fact, it’s even better than I’d originally contended.

One of the mistakes I made in my original post was to start throwing around numbers without setting any sort of baseline for comparison. Technology vendors have played slight-of-hand games with units of measure and data rates for years – think of how hard drive manufacturers prefer to define a megabyte (1 million bytes) versus how the rest of the world define[d] a megabyte (2^20 bytes or 1,048,576 bytes).

It’s important that if we’re going to compare the speed of two different network technologies, we establish where we’re taking the measurement. Is it, as with 10GE, measured as bandwidth available at the MAC layer (in other words, after encoding overhead), or as I perhaps erroneously did with FC, measuring it at the physical layer (in other words, before encoding overhead). I also incorrectly stated, unequivocally, that 10GE used 64/66b encoding, when in fact 10GE can use 8b/10b, 64b/66b, or other encoding mechanisms – what’s important is not what is used at the physical layer, but rather what is available at the MAC layer.

In the case of 10GE, 10Gb/s is available at the MAC layer, regardless of the encoding mechanism, transceivers, etc used at the physical layer.

The Fibre Channel physical layer, on the other hand, sets its targets in terms of MB/s available to the Fibre Channel protocol (FC-2 and above). This is the logical equivalent of Ethernet’s MAC layer – after any encoding overhead. 1Gb Fibre Channel (hereafter FC), as the story goes, was designed to provide a usable data rate of 100 MB/s.

If we’re truly going to take an objective look at the two protocols and how much bandwidth they provide at MAC (or equivalent) and above, we have to pick one method and stick with it. Since the subject is storage-focus (and frankly, most of the objections come from storage folks), let’s agree to use the storage method – measuring in MB/s available to the protocol. As long as we use that measurement, any differences in encoding mechanism becomes moot.

So back to 1Gb/s FC, with it’s usable data rate of 100 MB/s. The underlying physical layer of 1Gb/s FC uses a 1.0625 Gb/s data rate, along with 8b/10b encoding.

Now, this is where most of the confusion and debate seems to have crept into the conversation. I’ve been attacked by a number of folks (not on this site) for suggesting that 1Gb FC has a 20% encoding overhead, dismissing it as long-standing FUD – created by whom and for what purpose, I’ve yet to discover. No matter how you slice it, a 1.0625 Gb/s physical layer using 8b/10b encoding results in 0.85 Gb/s available to the next layer – in this case, FC-2. Conveniently enough, as there are 8 bits in a byte, 100MB/s can be achieved over a link providing approximately 800Mb/s, or 0.8Gb/s.

Now, who doesn’t like nice round numbers? Who cares what the underlying physical layer is doing, as long as it meets your needs/requirements/targets at the next layer up?

If the goal is 100MB/s, 1Gb/s FC absolutely meets it. Does 1Gb/s FC have a 20% encoding overhead? Yes. Is that FUD? No. Do we care? Not really.

As each generation of FC was released, the same physical layer was multiplied, without changing the encoding mechanism. So 8Gb/s FC is eight times as fast as 1Gb/s FC. The math is pretty simple : ( 1.0625 * 8 ) * 0.8 = 6.8 Gb/s available to the next layer. Before my storage folks (by the way – my background is storage, not Ethernet) cry foul, let’s look at what 6.8 Gb/s provides in terms of MB/s. A quick check of Google Calculator tells me that 6.8 Gb/s is 870 MB/s – well over the 800 MB/s we’d need if we were looking to maintain the same target of 100MB/s per 1 Gb/s of link. So again, who cares that there’s a 20% encoding overhead? If you’re meeting your target, it doesn’t matter. Normalized per Gb/s, that’s about 108 MB/s for every Gb/s of link speed.

At this point, you’re probably thinking – if we don’t care, why are you writing this? Well, in a converged network, I don’t really care what the historical target was for a particular protocol or link speed. I care about what I can use.

Given my newly discovered understanding of 10Gb Ethernet, and how it provides 10 Gb/s to the MAC layer, you can already see the difference. At the MAC layer or equivalent, 10GE provides 10Gb/s, or 1,280MB/s. 8G FC provides 6.8Gb/s, or 870MB/s. For the Fibre Channel protocol, native FC requires no additional overhead, while FCoE does require that the native FC frame (2148 bytes, maximum) be encapsulated to traverse an Ethernet MAC layer. This creates a total frame size of 2188 bytes maximum, which is about a 2% overhead incurred by FCoE as compared to native FC. Assuming that the full bandwidth of a 10Gb Ethernet link was being used to carry Fibre Channel protocol, we’re looking at an effective bandwidth of (1280MB/s * .98) = 1254Mb/s. Normalized per Gb/s, that’s about 125 MB/s for every Gb/s of link speed.

The whole idea of FCoE was not to replace traditional FC. It was to provide a single network that can carry any kind of traffic – storage, application, etc, without needing to have protocol-specific adapters, cabling, switching, etc.

Given that VERY few servers will ever utilize 8Gb/s of Fibre Channel bandwidth (regardless of how or where you measure it), why on earth would you invest in that much bandwidth and the cables, HBAs, and switches to support it? Why wouldn’t you look for a solution where you have burst capabilities that meet (or in this case, exceed) any possible expectation you have, while providing flexibility to handle other protocols?

I don’t see traditional FC disappearing any time soon – but I do think its days are numbered at the access layer. Sure, there are niche server cases that will need lots of dedicated storage bandwidth, but the vast majority of servers will be better served by a flexible topology that provides better efficiencies in moving data around the data center. Even at the storage arrays themselves, why wouldn’t I use 10GE FCoE (1254 MB/s usable) instead of 8Gb FC (870 MB/s usable)?

Now, when 16Gb FC hits the market, it will be using 64/66b encoding. The odd thing, however, is that based on the data I’ve turned up from FCIA, it’s actually only going to be using a line-rate of 14.025 Gb/s, and after encoding overheads, etc, supplying 1600 MB/s usable (though my math shows it to be more like 1700 MB/s) – in keeping with the 1Gb/s = 100MB/s target that FC has maintained since inception.

Sometime after 16Gb FC is released, will come 40GE, followed by 32Gb FC, and again followed by 100GE. It’s clear that these technologies will continue to leapfrog each other for some time. My only question is, why would you continue to invest in a protocol-specific architecture, when you can instead have a flexible one? Even if you want the isolation of physically separate networks (and there’s still justification for that), why not use the one that’s demonstrably more efficient? FCoE hasn’t yet reached feature parity with FC – there’s no dispute there. It will, and when it does, I just can’t fathom keeping legacy FC around as a physical layer. The protocol is rock solid – I can’t see it disappearing the foreseeable future. The biggest benefits to FCoE come at the access layer, and we have all the features we need there today.

If you’d like to post a comment, all I ask is that you keep it professional. If you want to challenge my numbers, please, by all means do so – but please provide your math, references for your numbers, and make sure you compare both sides. Simply stating that one side or the other has characteristic X doesn’t help the discussion, nor does it help me or my readers learn if I’m in error.

Finally, for those who have asked (or wondered in silence) – I don’t work for Cisco, or any hardware manufacturer for that matter. My company is a consulting and educational organization focused on data center technologies. I don’t have any particular axe to grind with regards to protocols, vendors, or specific technologies. I blog about the things I find interesting, for the benefit of my colleagues, customers, and ultimately myself. Have a better mousetrap? Excellent. That’s not going to hurt my feelings one bit. 🙂

Some comments follow with the goal of more illumination on some of your points.

1. You say “Technology vendors have played slight-of-hand games with units of measure and data rates for years – think of how hard drive manufacturers prefer to define a megabyte (1 million bytes) versus how the rest of the world define[d] a megabyte (2^20 bytes or 1,048,576 bytes).”

I think you show your ignoranace here regarding prefixes and their origin. To be clear, Kilo, Mega, Giga, etc, were originally defined (long before computing) to apply to the powers of 10, aka they were used with “base 10” arithmetic. Since computing relies on “base 2” arithmetic, it applied the original prefixes to base 2 resulting potential confusion. No slight of hand here at all by the disk drive manufacturers or anyone else. Note there is an attempt to remove this historical ambiquity, http://en.wikipedia.org/wiki/Data_rate_units, but as with most “de facto” standards, not much traction has been seen yet.

3. There is much dispute regarding the future market for FCoE vs. Fibre Channel. This is common when a new method of doing the same thing challenges an existing method of doing it. It was true in the early days of Fibre Channel competing with DAS. Time will tell what customers will buy. Stay tuned 🙂

4. Customer adoption of converged networks vs converged technology is, I believe, an important consideration. FCoE offers the choice of pursuing either as deploying FCoE does NOT require a converged network. Historically, two physically separate “air gap” networks for storage have been best practice. Changing the transport (from native Fibre Channel to Ethernet) doesn’t affect that best practice as it has nothing to do with the transport technoloyg, but much to do with Murphy’s Law. I think this is the “lie” of the FCoE story. I suspect much of the low market adoption rate is explained by customers who see the lie of converged networks and are avoiding it. My sense is those who wish to see FCoE succeed should abandon the “converged network” mantra and find other significant value propositions.

5. To the observation that you make on link rates of Fibre Channel and Ethernet, “Sometime after 16Gb FC is released, will come 40GE, followed by 32Gb FC, and again followed by 100GE.”, I wanted to clarify a couple of things. (1) 100 GbE is in the market today, so it’s not “after” 16 Gbps FC, it’s before. (2) FC link rate increases are now closely following Moore’s Law, twice the bandwith every 2 years while Ethernet has historically provided 10x the bandwidth every 8-10 years. So, based on history, FC will increase by 16x in 8 years while Ethernet will increase by 10x in 8 years. But, that’s just an interesting factoid with little customer value. Bandwidth increases are driven by market needs, not just Moore’s Law. It’s not clear if the data center market demand for storage bandwidth will be similar to the demand for Ethernet bandwidth. Time will tell.

You should be careful when tossing around aspersions such as “ignoranace” (beautiful that the one word out of your entire post that you happen to misspell is “ignorance”). The fact that I worked in the hard drive industry (Conner Peripherals/Seagate Technology, 1996-1998) for a few years suggests that I might actually know a thing or two about hard drives. I’m well aware of the origins of the various prefixes. You yourself point out that in computing we use powers of two, and yet the hard drive manufacturers choose to use the historical, non-computing definition in order to make their hard drives look larger. Both uses are perfectly valid – however, the choice of hard drive manufacturers to use the “other” standard was one of marketing, not computing. In fact, prior to the mid-nineties, some hard drive manufacturers did report their capacities in the base 2 values in keeping with the computing standard. Trust, me I spent many hours explaining to angry Enterprise customers who wanted to know why their 1.2GB hard drive was reported by their operating system to be 1.0GB… seems that operating system vendors kept to the computing standard, but hard drive manufacturers used the more convenient historical usage – why do you think that is?

I don’t have access to the particular book you reference, but I do have access to Storage Networking Protocol Fundamentals, Cisco Press, James Long. Pages 168-173 document the Fibre Channel frame format as being a word-oriented structure, where FC frames a 4-bytes long. A maximum sized FC frame is 537 words – ( 537 * 4 ) = 2148. Are you really going to argue down to a potential (though apparently misleading or inaccurate) 20 byte – less than 1% – difference, when the scale we’re discussing is in the 20% range?

No question that the future is uncertain. Nothing is certain but death, taxes, and uncertainty.

As I noted, converged networks are not the only benefit of FCoE. I even pointed out that there was still justification for “air-gap” networks, and that FCoE was still relevant as an all-storage (non-converged) technology. Many (but not all) of the historical reasons for a separate storage network (protocol differences, SCSI expectations) have faded. Not saying they’re gone completely, but there are certainly other ways to solve those problems now. Find the solution that works best for a given customer, and use it.

My point regarding 40G and 100G Ethernet was one of wide availability and adoption in the data center space, not necessarily niche availability.

No one disputes that 8 is less than 10, but you seem to conveniently and consistently ignore (or dispute, I’m not sure) that 108 is less than 125 – and that’s the number that’s really important.

You seem overly emotional about facts, despite your request to provide corrections with references. I was trying to be accurate, which is a virtue in most technical areas. If you are designing networking protocols, a bit or two is pretty important 🙂

Chill… its only information we are exchanging. It’s not a religious war.

I don’t think I attacked you. If I had called your comment stupid, that would be an attack. I used the word “ignorant” meaning “unaware of” as you seemd unaware of some of the facts I provided. If you are aware of them, great. Perhaps others who read your blog aren’t. That’s what comments to blogs are good for, adding something the poster didn’t include. That’s not attacking the poster, is i?

There is no emotion in the word ignorant, or at least there shouldn’t be. I’m often ignorant, and as you point out, I’m ignorant of how to spell ignorant 🙂 Thanks for catching my typo … much appreciated my friend. I’ll try to do better next time.

I was agreeing with your observation on the value of air-gap networks for storage IO. Many of our customers are not going to abandon that “best practice”, so they rejected FCoE out of hand since it got wrapped with the hype over “converged networks”. Hence my comments about another way to show FCoE economic value. I think it’s still early days and FCoE is interesting.

I agreed with your point, on 40 GbE and 100 GbE, but I was clearing up the facts about 100 GbE already being available.

Glad we argee on the 8 < 10. I also agree that 108 is less than 125. 🙂

Well, I found my mistake on the frame size. The material I cited included the inter-frame idle words for link synchronization and the Class 2 ACK as that was part of the computation of the data rate. (So for Class 3, the data rate would be a bit more as there would be no ACK and associated inter-frame idle words). Of course, those aren’t part of the frame size computation. Sorry about that.

Total overall bandwidth is one thing….like a Lamborghini being able to do 200mph….but how about latency? Is the Lamborghini stuck in traffic? Is FC essentially scsi commands inside a FC packet?, and FCOE essentially scsi commands inside a FC packet inside a FCOE packet ? Do we have any benchmarks to compare? When vendors look for a high speed fabric to do a benchmark, what do they use? When I look at the top TPC results…I don’t see any mention of FCOE.

FC is more than just “SCSI inside an FC [frame]”, since FC provides routing, name services, zoning (access control, etc). FCoE, is as you said, an FC [frame] inside an Ethernet [frame].

If I’m trying to build a lab-queen configuration with the lowest measurable latency, yes, I’d probably go with native and dedicated FC HBAs. For real world performance with real-world workloads, I’d challenge anyone to notice a difference.

If you’re trying to build lowest latency and all, you’d go for SCSI over Infiniband (SRP, whatever’s the current name).

Funny thing is that there’s still a legacy requirement that “switch is allowed to hold the frame for 500ms during congestion, before dropping” (yep, FC switches can drop things, just it takes them ages to do that :).
That officially limits your network to well-known 3 hops (or 2, for FICON).
If you’d have 4 hops in your network – well, you’re asking for trouble — legally, officially network can time-out your frames (ED_TOV of 2s).
Thing is, that 500ms rule was created when dinosaurs were walking on Earth. Now 500ms is kind of an “eternity” thing. And limits distance/scalability. Remember that even if you’re AccessGateway/NPV device you still have right to hold that frame for 500ms (so basically you count as a hop).
Why wouldn’t they change it to, say, 300ms? Still a lot of time for the switch, but allows realistically 5-6 switches in the path?

(Yes, I’m fully aware that generally vendors do allow for more than 3 hops, Brcd 6-or-7, Cisco doesn’t really say; my point is – standard says “you’re allowed to hold it for 500ms”)

Hi Dave,
Just wanted to let you know that I found your article quite educational and humbling. I’ve been involved in the storage industry for over 10 years, and due to my ignorance (or technology vendors’ slight-of-hand techniques as you pointed out), I was never able to fully appreciate why 1Gb/s translated to ~100MB/s in the FC world until now. I was aware of 8B/10B encoding but was never explained (or understood) its impact on the overall data transport efficiency.

I also appreciated some other user comments (e.g. ‘K’) with their insight into multi-hop considerations and why you rarely see anything four (4) or over, although Brocade has been publishing 6-7 for ever now.

As a solutions architect who of late has been tasked to nail down optimal technology for both throughput and low-latency (I needed a converged network that can handle some ultra-high performance back end storage) I believe that I will be using one of those new niche solutions. I’m sure you saw Brocades announcement of their 16Gb FC switches, but I’m more interested in the new 40GbE switches from Mellanox, using their Unified Fabric Manager and the ConnectX2 dual port 40Gb HCA’s running the Ethernet drivers. This promises to be really nice on the latency front as well, 230ns, and power efficient, less than 1.3W per port.

It is either this or the IB flavor as an alternative, but the UFM for their scale out ethernet has some nice features for VMware that I’m interested in.

The main driver for this is being able to put a lot of vm’s on a dense compute stack and not have I/O be the bottleneck. My goal is to have multiple 6GBytes/s 500,000 IOPS storage be effectively delivered to a boatload of vm’s and virtual desktop systems.

Have you looked at any of the Voltaire/Mellanox products enough to have an opinion on whether there are any ‘cons’ to this approach?

Latency is key in clustering filesystems such as VMware VMFS, where clustered LUNs service exponentially higher IO than traditional LUNs that are attached to a single operating system’s workload.

Pursuant to VMware’s seminal “Scalable Storage Performance”, the magic number for acceptable latency within clustered VMFS environments is 50ms, hence the advent of VMware’s Storage I/O Control in vSphere 4.1 which guarantees latency thresholds for “protected” VM workloads in saturated storage fabrics. This probably indicates that even with the use of more recent generation 8GB FC and converged 10GB FCoE storage topologies, contention for IO in clustered filesystem environments is becoming an issue that increased storage throughput doesn’t necessarily address or fix.

The 500ms factoid with conventional FC is interesting. In theory FCoE would have a distinct advantage in the latency department based upon its reliance on lossless Ethernet for the transport…?

I have a question – I am trying to “size” some theoretical server configurations. I have read the posts here rather quickly. But it seems to me that the math is based on 100% utilization of the paths – and this is never a good idea. Would you please confirm or comment on the implied utilization in these calculations?
Thanks
JH

It’s not really a discussion of utilization, rather just on what’s available on the wire. Of course you should do your sizing based on your tolerance for utilization, including acceptable performance in typical or expected failure modes.

Assuming well architected solutions on tradional FC and 10Gb FCoE, I wouldn’t expect any significant differences in latency. Both transports have the ability to be very low latency on a per switching decision basis – more important is how you design your network, Ethernet or FC.

Just wanted to mention to people out there, one of our customers sent us your link thinking you were comparing 10Gbs iSCSI with 8Gbs FC, this is NOT what you are comparing and when comparing these two technologies 8Gbs FC is more than often in our experience faster and more efficient thatn 10Gbs iSCSI via TCP/IP protocols… people wanting comparisons of these technologies should perhaps read this article as a starting block:

1. It’s provided by Emulex, a well-established FC HBA vendor
2. They connected initiators directly to targets, with no switching whatsoever. This may work for testing, but doesn’t reflect actual workload environments.

I doubt anyone will notice this considering the age of this post, but I felt compelled to mention these points.

I ran across a 10Gb ethernet vs 4Gb FC test by IBM and they performed about the same. IBM said the test goal was to, “…compare the performance of 10Gb Fibre Channel over Ethernet (FCoE) and iSCSI protocols with 4Gb Fibre Channel (FC), using a Microsoft SQL Server 2008….” Their results were, “The average throughput for the FC [4Gb], FCoE [10Gb] and iSCSI [10Gb] tests was virtually identical. The FCoE test showed a slightly higher average throughput than either FC or iSCSI. The average bandwidth used for FCoE was 2.82GBps; FC was 2% slower at 2.76GBps, and iSCSI was 4% slower at 2.66GBps.” If someone finds a 10Gbe vs 8Gb FC test somewhere please post the results! Here is the souce/link: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101686

Hi. You say “bandwidth available at the MAC layer”. What I think you mean is “bandwidth available at the physical layer”. 1 Gbps Ethernet, for example, is referring to the physical line offering 1 Gbps. Going from the physical layer to the MAC layer (data-link or link layer), there is bandwidth lost due to insertion of the inter-packet gap (IPG). It is at the physical layer that the 12-byte (a.k.a 12 “octet”) inter-packet gap (IPG) is inserted. The other 8 bytes of the standard-required 20-byte gap is the 7-byte preamble and 1-byte start frame delimiter, both added at the MAC layer.

So, whatever percentage 12 bytes is of the average packet size, that’s the percentage of bandwidth lost at the physical layer. What’s left is the bandwidth available at the MAC layer.