In Defense Of VMware NSX And The Overlay Approach

By using a network overlay architecture, NSX cleanly segments SDN approaches into two realms: the physical and virtual. The overlay strategy has been criticized, but it makes the most sense.

At the heart of the debate about how to best build software-defined networks is the issue of decoupling the physical and logical realms. While many network equipment vendors try to conflate the two, ultimate success is likely to hinge on how well SDN and network virtualization architectures separate physical resources like switch ports, network links and L2 traffic flows from logical abstractions like virtual interfaces and networks.

In making the case for decoupling, Martin Casado, VMware's chief network architect and key developer of the Nicira virtualization technology underpinning NSX, points out that physical and logical networks address very different sets of problems with, if not divergent at least non-overlapping, sets of requirements. In an interview at VMworld, Casado noted that it's just good engineering practice to segment logically independent problems to allow optimizing solutions for each domain.

Clarifying his thoughts via email, Casado wrote, "Good engineering practice is based around modularization, where you decouple functionality independent problems so each subcomponent can evolve independently." He added, " If we clearly identify the interface and leave the rest decoupled, each can evolve at their own pace (the overlay providing a better service model to end systems and the underlay more efficient transport) while still maintaining correctness."

Casado believes the network overlay/underlay dichotomy is a good example of system modularization, adding that it's conceptually similar to the way routers and switches are designed, with a modular backplane handling traffic and line cards delivering various network services and protocols. "The overlay provides applications with network services and a virtual operations and management interface. The physical network is responsible for providing efficient transport," Casado wrote.

For example, physical networks manage traffic flows between switch ports. Traditionally, there has been a one-to-one mapping between network ports and IP addresses, which has been upset by server virtualization. This might imply a need to bridge the physical and virtual worlds; however, they can operate independently with tunneling protocols like VXLAN and NVGRE that facilitate building overlay networks. Indeed, NSX is just such a network overlay.

Visibility, Scalability Concerns

Common objections to network overlays are that they don't scale, don't provide management visibility to both physical and virtual networks and can't optimize performance for changing traffic patterns due to poor (or immature) interfaces between the two. The scalability issue has been the subject of much, sometimes impassioned, discussion in the blogosphere. Ivan Pepeljnak, chief technology advisor at NIL Data Communications, argued in a blog post that tunneling does scale, as used by schemes like NSX: "Will they scale? Short summary: Yes. The real scalability bottleneck is the controller and the number of hypervisor hosts it can manage."

But what about a virtual overlay's dependence on the performance of the physical network and the inability to control L2 packet flows? A lengthy blog post by Dimitri Stiliadis, co-founder and chief architect at Nuage Networks, makes the point that for a properly designed network--that is, flat CLOS or fat tree--with ample, low-latency bandwidth throughout the data center, there's no need to micromanage local flow switching since any potential improvements using centralized intelligence are negligible.

As Stiliadis aptly puts it: "If we consider a standard leaf/spine architecture, the question that one has to answer is whether any form of traffic engineering, or QoS, at the spine or leaf layers can make any difference in the quality of service or availability of flows. In other words, could we do something intelligent in this fabric to improve network utilization by routing traffic on non-shortest paths? Well, the answer ends up being an astounding NO, provided that the fabric is well engineered."

[Get up to speed on the unfamiliar protocols and technologies NSX introduces to the data center network in "Inside VMware NSX."]

Casado largely agrees with this conclusion, at least for enterprise data centers; cloud service providers are another story. However, he recognizes there must be a control interface between the physical and virtual worlds to handle cases where the fast, wide, flat network fabric still runs into problems. He wrote in his email: "Of course there is an interface between these two [the physical and virtual], for example, QoS bits to enforce SLAs and drop policies, multicast bits to offload packet replication, flow-level entropy to aid multipathing, etc."

In a press Q&A at VMworld last month, Casado pointed out that one reason cloud services like Google were so enamored with OpenFlow, which he originally developed as part of his doctorate research at Stanford, was its ability to handle extreme conditions like elephant flows creating bottlenecks on specific network segments or multiple, real-time streams with time-sensitive delivery requirements. He argues that such conditions are uncommon in enterprise data centers, or at least easily handled by well-designed switch fabrics. Where bridging the physical-virtual divide is necessary, OpenFlow is the answer; indeed, it's the protocol the NSX controller uses to manage virtual switch instances.

NEXT: Cisco's View

Kurt Marko is an InformationWeek and Network Computing contributor and IT industry veteran, pursuing his passion for communications after a varied career that has spanned virtually the entire high-tech food chain from chips to systems. Upon graduating from Stanford University ... View Full Bio

Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.