Reading a bit of networking stuff, which is new to me, as I am trying to understand and appreciate NSX (instead of already diving into it). Hence a few of these TIL posts like this one and the previous.

One common term I read in the context of NSX or SDN (Software Defined Networking) in general is “control plane” and “data plane” (a.k.a “forwarding” plane).

This forum post is a good intro. Basically, when it comes to Networking your network equipment does two sort of things. One is the actual pushing of packets that come to it to others. The other is figuring out what packets need to go where. The latter is where various networking protocols like RIP and EIGRP come in. Control plane traffic is used to update a network device’s routing tables or configuration state, and its processing happens on the network device itself. Data plane traffic passes through the router. Control plane traffic determines what should be done with the data plane traffic. Another way of thinking about control plan and data planes is where the traffic originates from/ is destined to. Basically, control plane traffic is sent to/ from the network devices to control it (e.g RIP, EIGRP); while data plane traffic is what passes through a network device.

( Control plane traffic doesn’t necessarily mean its traffic for controlling a network device. For example, SSH or Telnet could be used to connect to a network device and control it, but it’s not really in the control plane. These come more under a “management” plane – which may or may not be considered as a separate plane. )

Once you think of network devices along these lines, you can see that a device’s actual work is in the data plane. How fast can it push packets through. Yes, it needs to know where to push packets through to, but the two aren’t tied together. It’s sort of like how one might think of a computer as being hardware (CPU) + software (OS) tied together. If we imagine the two as tied together, then we are limiting ourselves on how much each of these can be pushed. If improvements in the OS require improvements in the CPU then we limit ourselves – the two can only be improved in-step. But if the OS improvements can happen independent of the underlying CPU (yes, a newer CPU might help the OS take advantage of newer features or perform better, but it isn’t a requirement) then OS developers can keep innovating on the OS irrespective of CPU manufacturers. In fact, OS developers can use any CPU as long as there are clearly defined interfaces between the OS and the CPU. Similarly, CPU manufacturers can innovate independent of the OS. Ultimately if we think (very simply) of CPUs as having a function of quickly processing data, and OS as a platform that can make use of a CPU to do various processing tasks, we can see that the two are independent and all that’s required is a set of interfaces between them. This is how things already are with computers so what I mentioned just now doesn’t sound so grand or new, but this wasn’t always the case.

With SDN we try to decouple the control and data planes. The data plane then is the physical layer comprising of network devices or servers. They are programmable and expose a set of interfaces. The control plane now can be a VM or something independent of the physical hardware of the data plane. It is no longer limited to what a single network device sees. The control plane is aware of the whole infrastructure and accordingly informs/ configures the data plane devices.

If you want a better explanation of what I was trying to convey above, this article might help.

In the context of NSX its data plane would be the VXLAN based Logical Switches and the ESXi hosts that make it up. And its control plane would be the NSX Controllers. It’s the NSX Controllers that takes care of knowing what to do with the network traffic. It identifies all these, informs the hosts that are part of the data plane accordingly, and let them do the needful. The NSX Controller VMs are deployed in odd numbers (preferably 3 or higher, though you could get away with 1 too) for HA and cluster quorum (that’s why odd numbers) but they are independent of the data plane. Even if all the NSX Controllers are down the data flow would not be affected.

I saw a video from Scott Shenker on the future of networking and the past of protocols. Here’s a link to the slides, and here’s a link to the video on YouTube. I think the video is a must watch. Here’s some of the salient points from the video+slides though – mainly as a reminder to myself (note: since I am not a networking person I am vague at many places as I don’t understand it myself):

Layering is a useful thing. Layering is what made networking successful. The TCP/IP model, the OSI model. Basically you don’t try and think of the “networking problem” as a big composite thing, but you break it down into layers with each layer doing one task and the layer above it assuming that the layer below it has somehow solved that problem. It’s similar to Unix pipes and all that. Break the problem into discrete parts with interfaces, and each part does what it does best and assumes the part below it is taking care of what it needs to do.

This layering was useful when it came to the data plane mentioned above. That’s what TCP/IP is all about anyways – getting stuff from one point to another.

The control plane used to be simple. It was just about the L2 or L3 tables – where to send a frame to, or where to send a packet to. Then the control plane got complicated by way of ACLs and all that (I don’t know what all to be honest as I am not a networking person :)). There was no “academic” approach to solving this problem similar to how the data plane was tackled; so we just kept adding more and more protocols to the mix to simply solve each issue as it came along. This made things even more complicated, but that’s OK as the people who manage all these liked the complexity and it worked after all.

A good quote (from Don Norman) – “The ability to master complexity is not the same as the ability to extract simplicity“. Well said! So simple and prescient.

It’s OK if you are only good at mastering complexity. But be aware of that. Don’t be under a misconception that just because you are good at mastering the complexity you can also extract simplicity out of it. That’s the key thing. Don’t fool yourself. :)

In the context of the control plane, the thing is we have learnt to master its complexity but not learnt to extract simplicity from it. That’s the key problem.

To give an analogy with programming, we no longer think of programming in terms of machine language or registers or memory spaces. All these are abstracted away. This abstraction means a programmer can focus on tackling the problem in a totally different way compared to how he/ she would have had to approach it if they had to take care of all the underlying issues and figure it out. Abstraction is a very useful tool. E.g. Object Oriented Programming, Garbage Collection. Extract simplicity!

Another good quote (from Barbara Liskov) – “Modularity based on abstraction is the way things get done“.

Or put another way :) Abstractions -> Interfaces -> Modularity (you abstract away stuff; provide interfaces between them; and that leads to modularity).

As mentioned earlier the data plan has good abstraction, interfaces, and modularity (the layers). Each layer has well defined interfaces and the actual implementation of how a particular layer gets things done is down to the protocols used in that layer or its implementations. The layers above and below do not care. E.g. Layer 3 (IP) expects Layer 2 to somehow get it’s stuff done. The fact that it uses Ethernet and Frames etc is of no concern to IP.

So, what are the control plane problems in networking?

We need to be able to compute the configuration state of each network device. As in what ACLs are it supposed to be applying, what its forwarding tables are like …

We need to be able to do this while operating without communication guarantees. So we have to deal with communication delays or packet drops etc as changes are pushed out.

We also need to be able to do this while operating within the limitations of the protocol we are using (e.g. IP).

Anyone trying to master the control plane has to deal with all three. To give an analogy with programming, it is as though a programmer had to worry about where data is placed in RAM, take care of memory management and process communication etc. No one does that now. It is all magically taken care of by the underlying system (like the OS or the programming language itself). The programmer merely focuses on what they need to do. Something similar is required for the control plane.

What is needed?

We need an abstraction for computing the configuration state of each device. [Specification Abstraction]

Instead of thinking of how to compute the configuration state of a device or how to change a configuration state, we just declare what we want and it is magically taken care of. You declare how things should be, and the underlying system takes care of making it so.

We think in terms of specifications. If the intention is that Device A should not have access to Device B, we simply specify that in the language of our model without thinking of the how in terms of the underlying physical model. The shift in thinking here is that we view each thing as a layer and only focus on that. To implement a policy that Device A should not have access to Device B we do not need to think of the network structure or the devices in between – all that is just taken care of (by the Network Operating System, so to speak).

This layer is Network Virtualization. We have a simplified model of the network that we work with and which we specify how it should be, and the Network Virtualization takes care of actually implementing it.

We need an abstraction that captures the lack of communication guarantees- i.e. the distributed state of the system. [Distributed State Abstraction]

Instead of thinking how to deal with the distributed network we abstract it away and assume that it is magically taken care of.

Each device has access to an annotated network graph that they can query for whatever info they want. A global network view, so to say.

There is some layer that gathers an overall picture of the network from all the devices and presents this global view to the devices. (We can think of this layer as being a central source of information, but it can be decentralized too. Point is that’s an implementation problem for whoever designs that layer). This layer is the Network Operating System, so to speak.

We need an abstraction of the underlying protocol so we don’t have to deal with it directly. [Forwarding Abstraction]

Network devices have a Management CPU and a Forwarding ASIC. We need an abstraction for both.

The Management CPU abstraction can be anything. The ASIC abstraction is OpenFlow.

This is the layer that closest to the hardware.

SDN abstracts these three things – distribution, forwarding, and configuration.

You have a Control Program that configures an abstract network view based on the operator requirements (note: this doesn’t deal with the underlying hardware at all) ->

You have a Network Virtualization layer that takes this abstract network view and maps it to a global view based on the underlying physical hardware (the specification abstraction) ->

You have a Network OS that communicates this global network view to all the physical devices to make it happen (the distributed state abstraction (for disseminating the information) and the forwarding abstraction (for configuring the hardware)).

Very important: Each piece of the above architecture has a very limited job that doesn’t involve the overall picture.

SDN has three layers: (1) an Application layer, (2) a Control layer (the Control Program mentioned above), and (3) an Infrastructure layer (the network devices).

The Application layer is where business applications reside. These talk to the Control Program in the Control layer via APIs. This way applications can program their network requirements directly.

OpenFlow (mentioned in Scott’s talk under the ASIC abstraction) is the interface between the control plane and the data/ forwarding place. Rather than paraphrase, let me quote from that whitepaper for my own reference:

OpenFlow is the first standard communications interface defined between the control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software.

OpenFlow can be compared to the instruction set of a CPU. The protocol specifies basic primitives that can be used by an external software application to program the forwarding plane of network devices, just like the instruction set of a CPU would program a computer system.

OpenFlow uses the concept of flows to identify network traffic based on pre-defined match rules that can be statically or dynamically programmed by the SDN control software. It also allows IT to define how traffic should flow through network devices based on parameters such as usage patterns, applications, and cloud resources. Since OpenFlow allows the network to be programmed on a per-flow basis, an OpenFlow-based SDN architecture provides extremely granular control, enabling the network to respond to real-time changes at the application, user, and session levels. Current IP-based routing does not provide this level of control, as all flows between two endpoints must follow the same path through the network, regardless of their different requirements.

I don’t think OpenFlow is used by NSX though. It is used by Open vSwitch and was used by NVP (Nicira Virtualization Platform – the predecessor of NSX).

Speaking of NVP and NSX: VMware acquired NSX from Nicira (which was a company founded by Martin Casado, Nick McKeown and Scott Shenker – the same Scott Shenker whose video I was watching above). The product was called NVP back then and primarily ran on the Xen hypervisor. VMware renamed it to NSX and it was has two flavors. NSX-V is the version that runs on the VMware ESXi hypervisor, and is in active development. There’s also NSX-MH which is a “multi-hypervisor” version that’s supposed to be able to run on Xen, KVM, etc. but I couldn’t find much information on it. There’s some presentation slides in case anyone’s interested.

Before I conclude here’s some more blog posts related to all this. They are in order of publishing so we get a feel of how things have progressed. I am starting to get a headache reading all this network stuff, most of which is going above my head, so I am going to take a break here and simply link to the articles (with minimal/ half info) and not go much into it. :)

This one talks about how the VXLAN specification doesn’t specify any control plane.

There is no way for hosts participating in a VXLAN network to know the MAC addresses of other hosts or VMs in the VXLAN so we need some way of achieving that.

Nicira NVP uses OpenFlow as a control-plane protocol.

This one talks about how OpenFlow is used by Nicira NVP. Some points of interest:

Each Open vSwitch (OVS) implementation has 1) a flow-based forwarding module loaded in the kernel; 2) an agent that communicates with the Controller; and 3) an OVS DB daemon that keeps track of of the local configuration.

Read that post on how the forwarding tables and tunnel interfaces are modified as new devices join the overlay network.

Broadcast traffic, unknown Unicast traffic, and Multicast traffic (a.k.a. BUM traffic) can be handled in two ways – either by sending these to an extra server that replicates these to all devices in the overlay network; or the source hypervisor/ physical device can encapsulate the BUM frame and send it as unicast to all the other devices in that overlay.

This one talks about how Nicira NVP seems to be moving away from OpenFlow or supplementing it with something (I am not entirely clear).

This is a good read though just that I was lost by this point coz I have been doing this reading for nearly 2 days and it’s starting to get tiring.

One more post from the author of the three posts above. It’s a good read. Kind of obvious stuff, but good to see in pictures. That author has some informative posts – wish I was more brainy! :)

While reading about NSX I was under the impression VXLAN is something VMware cooked up and owns (possibly via Nicira, which is where NSX came from). But turns out that isn’t the case. It was originally created by VMware & Cisco (check out this Register article – a good read) and is actually covered under an RFC 7348. The encapsulation mechanism is standardized, and so is the UDP port used for communication (port number 4789 by the way). A lot of vendors now support VXLAN, and similar to NSX being an implementation of VXLAN we also have Open vSwitch. Nice!

(Note to self: got to read more about Open vSwitch. It’s used in XenServer and is a part of Linux. The *BSDs too support it).

VXLAN is meant to both virtualize Layer 2 and also replace VLANs. You can have up to 16 million VXLANs (the NSX Logical Switches I mentioned earlier). In contrast you are limited to 4094 VLANs. I like the analogy of how VXLAN is to IP addresses how cell phones are to telephone numbers. Prior to cell phones, when everyone had landline numbers, your phone number was tied to your location. If you shifted houses/ locations you got a new phone number. In contrast, with cell phones numbers it doesn’t matter where you are as the number is linked to you, not your location. Similarly with VXLAN your VM IP address is linked to the VM, not its location.

Update:

Found a good whitepaper by Arista on VXLANs. Something I hadn’t realized earlier was that the 24bit VXLAN Network Identifier is called VNI (this is what lets you have 16 millions VXLAN segments/ NSX Logical Switches) and that a VM’s MAC is combined with its VNI – thus allowing multiple VMs with the same MAC address to exist across the network (as long as they are on separate VXNETs).

Also, while I am noting acronyms I might as well also mention VTEPs. These stand for Virtual Tunnel End Points. This is the “thing” that encapsulates/ decapsulates packets for VXLAN. This can be virtual bridges in the hypervisor (ESXi or any other); or even VXLAN aware VM applications or VXLAN capable switching hardware (wasn’t aware of this until I read the Arista whitepaper).

VTEP communicates over UDP. The port number is 4789 (NSX 6.2.3 and later) or 8472 (pre-NSX 6.2.3).

A post by Duncan Epping on VXLAN use cases. Probably dated in terms of the VXLAN issues it mentions (traffic tromboning) but I wanted to link it here as (a) it’s a good read and (b) it’s good to know such issues as that will help me better understand why things might be a certain way now (because they are designed to work around such issues).