Does standardized interoperability in data center fabrics matter?

Despite interoperability trials and demonstrations involving alternative data center fabric standards, a non-standard fabric technology is said by proponents to be at the front of the pack.

Multichassis Link Aggregation (MLAG) and two percolating fabric standards, TRILL and Shortest Path Bridging (SPB), are designed to create multiple active paths to work around the limitations of Spanning Tree, reduce latency and facilitate more of an "East-West" traffic flow between server racks.

All three technologies are designed to extend or replace Spanning Tree in Ethernet networks for data centers. Spanning Tree allows for one active path, which can induce latency and traffic flows unsuitable for Ethernet-based data center and cloud infrastructures.

MLAG provides node-level redundancy to the link-level redundancy provided by the Link Aggregation Group specification, defined in IEEE 802.1AX-2008. But even though LAG is a standard, MLAG is not; it is vendor-specific and its implementation varies by vendor, and most support various flavors of it in their data center switches.

As a result, two switches in an MLAG group have to be from the same vendor. It can work with most virtual and network switches, and network appliances, including those running LAG - but two switches that are MLAGd have to be from the same vendor.

So is MLAG interoperable?

It might not matter. Users don't build MLAG groups with multilvendor switches, says Doug Gourlay, vice president of marketing at Arista Networks, a proponent of MLAG for Layer 2 fabrics.

"I've never once seen a customer do that in a production environment," Gourlay says. "How many times have you seen a customer say, 'In this aggregation or distribution tier of my network, I want Nortel on the left and Force 10 on the right'? I've never seen it. People don't deploy networks that way."

Networks can be constructed with multivendor interoperability between tiers, Gourlay says. But within tiers, it's usually a single vendor environment.

"If people were already trying to deploy two different devices from two different vendors in the same location, I would (MLAG) does limit you," he says. "But I've never seen that. The de facto deployment model we have today is people using two boxes, almost always identical, from the same vendor, and frankly almost always the same software version unless they're in the middle of a change control."

And that may be the key: multivendor devices in the same MLAG group requiring a software change or update would require a network outage, whereas devices from the same vendor running the same version of software could undergo an in-service software upgrade if it is supported by the vendor's MLAG implementation.

"There's no way that I can seamlessly do that upgrade from one distribution pair to another without causing at any point in time some wiring closet to be a little bit disconnected," Gourlay says. "So I kind of struggle with the benefit of interoperability because I can't see how it would enable me to do a seamless cutover; and if it doesn't enable me to do a seamless cutover then I might as well just take a four hour change control on a Saturday evening, get my new distribution tier up and running perfectly, and then just re-do my patches."

Those two MLAG switches in the same network tier - access/top-of-rack, aggregation/distribution or core - have to be from the same vendor because they exchange state information, says Shehzad Merchant, senior director of strategy at Extreme Networks.

"Those two switches today at any given tier have to come from one vendor because they exchange state between them," Merchant says. "The way MLAG works, it creates the persona of a single switch. And in doing that it synchronizes some state between the two switches."

So single vendor switches are required to construct MLAG groups within a tier; but between tiers, the switches can be multivendor, Merchant says. That's because they are running standard LAG between tiers, he says.

So taken in totality, a data center network from server to core to storage can be LAGged and MLAGed for complete active-active redundancy in the fabric, Merchant says. TRILL and SPB are still developing this capability and cannot accomplish it today, he says.

"I perceive MLAG to be far more interoperable than TRILL or SPB," Merchant says. "If you look at it from a complete data center network perspective, MLAG allows you to interoperate between servers, appliances, switches, storage - all of those can be dual-homed using MLAG. If I want to deploy this today, and how do I make this work today with active-active redundancy all the way across my data center, MLAG is the only way you can make that happen today."

Even if fabric technologies are mixed and matched together -- if top-of-rack and core switches support SPB or TRILL, but servers and storage do not support them -- you'd still have to MLAG those for active-active redundancy, Merchant says.

"I would make the argument to keep it simple and go MLAG all the way," he says.

The debate between MLAG/TRILL/SPB and proprietary fabric offerings like Cisco's FabricPath and Juniper's QFabric is just another day in the networking industry, says Jim Metzler, vice president at consultancy Ashton, Metzler & Associates. Vendors will embrace standards but then add proprietary extensions to differentiate their offerings, and default back to the stripped-down standard as the lowest common denominator for multivendor interoperability.

"There's no standard for the hashing algorithm for how to put traffic over the MLAG links," Metzler says. "You'll give up some functionality to make it work" between multivendor switches.

"But I never heard anybody say it made a big difference," he says. "It's just a fact of life - a typical network challenge."

Copyright 2018 IDG Communications. ABN 14 001 592 650. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.