Optimizing East-West network traffic with NSX

VMware has done an excellent job in publicizing the features of its new NSX network virutalization platform. At least to vGeeks like myself. I am finding however that in the scope of various IT professionals that I interact with on a day-to-day basis, the level of familiarity is still quite limited. I suppose that is to be expected since it was only just announced at VMWorld last year, and unless you were included in the beta process, that would have been your first view of the product.

SDN in general, and NSX specifically, completely changes the paradigm of how we think of (and interact with) network services. So this is really a lot to chew on. It’s not something you are going to read a white paper on, deploy, and call it a day. It will fundamentally change many of your business processes. And this is not necessarily a bad thing.

What I would like to do is publish a few blog posts on specific features of the NSX platform. This will be the first of those. I intend to keep them short (well maybe not that short), simple, and focused on a very specific aspect of the NSX product.

One of the great features of NSX is how its distributed logical router optimizes the east-west traffic in your environment. If you have had practical experience designing and/or operating large cloud deployments you certainly understand the complications around virtual networking and how to properly implement an architecture that will scale appropriately. A common issue when using something like a virtual edge device as the first-hop gateway for your workloads, is that not only will it be handling all of your ingress/egress traffic, but all of your east-west traffic will hair-pin to that device as well. This can cause scalability issues if not designed properly. It is also just horribly inefficient.

NSX has a very elegant way of handling that. The “Distributed Logical Router” is essentially like the vSphere Distributed switch you know and love, but on NSX steroids. It can now handle layer 2-4. That means you now have a router running as a kernel module. So, literally the first hop gateway for your VM is in the kernel of the hypervisor that it is currently sitting on. Furthermore that router participates in the entire routing domain and has MAC information and ARP entries for everything in the environment.

Sounds cool right? So what this means for east-west traffic (traffic from my “Web” tier going to my “App” tier for example) is that even though they are on disparate layer-2 segments, the packet will flow to the distributed logical router, and then be delivered to the destination MAC address directly. Physically speaking, if the two workloads are on the same hypervisor, the packet will never hit the wire. If they are on separate hypervisors, the packet will egress to your TOR switch then to the destination hypervisor (assuming they are connected to the same TOR switch stack).

What I am saying here is that we are essentially eliminating hair-pinning in the virtual data center. This has even cooler implications if you are using a converged infrastructure like Cisco UCS, as Brad Hedlund brilliantly illustrates here.

This ability is extremely exciting for those of us tasked with building virtual networks that can scale. (And scale quickly).

1) Create your logical switches (formerly called VXLAN virtual wires).
2) Attach machines to those logical switches.
3) Create the DLR, and IP the interfaces on your DLR (this will be the default GW for your VMs). You will also select the logical switches (you created earlier) that are on the ingress side of the DLR where your VMs sit.
4) At this point you would normally also create an uplink interface leveraging a specific logical switch for transport traffic to the next-hop (IE: NSX edge), but I’ll cover that one later when we get to route peering.

In the GUI, it will look something like this:

You simply need to attach the interfaces to the logical switches you created earlier, and give those interfaces an IP address on that network. This address will be the default gateway for the machines on that network segment.

You can also see here that I have created a transit network for uplinking to the NSX edge. I will explain later how we can actually run a routing protocol (OSPF, BGP) over that network to advertise the routes owned by the DLR to the NSX edge router.

Physically the internal IP and MAC for these interfaces will exist on all hypervisors that are participating in the DLR. The uplink interface will also have the same IP on each hypervisor, but a unique MAC.

Logically it looks like this:

All three tiers are connected to the same distributed router. They can pass traffic between themselves without having to bounce out to an edge gateway, or some northbound physical router. (The northbound router is the DLR running in the kernel).

This has other interesting implications as we can now enact policy-based firewall controls on each vNIC in the entire environment (illustrated above) which, by the way, also run in the kernel. This is important to understand. The egress traffic from that VM will be inspected by your firewall policy before it even hits the wire (virtual or physical). This is what security guys like to call “zero-trust,” and it is awesome.

Lastly, we can also leverage Layer-2 bridging from the DLR (which provides layer-2 adjacency with the physical and virtual world for things like P2V). Again, I’ll get into that with another post. But basically it will allow you to connect physical hosts on VLAN segments with this router and have it exist in the same layer-2 segment as the virtual machines (running on VXLAN).

I hope my little bite-sized snippets of NSXness prove to be useful to someone. Please feel free to comment, and I am certainly open to suggestions for further topics.