I am putting a link to the official VMware documentation on this as I Googled it just to confirm to myself I am not doing anything wrong! What I need to do is migrate the physical NICs and Management/ VM Network VMkernel NIC from a standard switch to a distributed switch. Process is simple and straight-forward, and one that I have done numerous times; yet it fails for me now!

Here’s a copy paste from the documentation:

Navigate to Home > Inventory > Networking.

Right-click the dVswitch.

If the host is already added to the dVswitch, click Manage Hosts, else Click Add Host.

Select the host(s), click Next.

Select the physical adapters ( vmnic) to use for the vmkernel, click Next.

Select the Virtual adapter ( vmk) to migrate and click Destination port group field. For each adapter, select the correct port group from dropdown, Click Next.

Click Next to omit virtual machine networking migration.

Click Finish after reviewing the new vmkernel and Uplink assignment.

The wizard and the job completes moving both the vmk interface and the vmnic to the dVswitch.

Basically add physical NICs to the distributed switch & migrate vmk NICs as part of the process. For good measure I usually migrate only one physical NIC from the standard switch to the distributed switch, and then separately migrate the vmk NICs.

Here’s what happens when I am doing the above now. (Note: now. I never had an issue with this earlier. Am guessing it must be some bug in a newer 5.5 update, or something’s wrong in the underlying network at my firm. I don’t think it’s the networking coz I got my network admins to take a look, and I tested that all NICs on the host have connectivity to the outside world (did this by making each NIC the active one and disabling the others)).

First it’s stuck in progress:

And then vCenter cannot see the host any more:

Oddly I can still ping the host on the vmk NIC IP address. However I can’t SSH into it, so the Management bits are what seem to be down. The host has connectivity to the outside world because it passes the Management network tests from DCUI (which I can connect to via iLO). I restarted the Management agents too, but nope – cannot SSH or get vCenter to see the host. Something in the migration step breaks things. Only solution is to reboot and then vCenter can see the host.

Here’s what I did to workaround anyways.

First I moved one physical NIC to the distributed switch.

Then I created a new management portgroup and VMkernel NIC on that for management traffic. Assigned it a temporary IP.

Next I opened a console to the host. Here’s the current config on the host:

1

2

3

4

5

6

7

8

~# esxcli network ip interface ipv4 get

Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address TypeDHCP DNS

---------------------------------------------------------------

vmk010.xxx.xx.30255.255.255.010.xxx.xx.255STATIC false

vmk110.xxx.xx.24255.255.255.010.xxx.xx.255STATIC false

vmk210.xxx.xx.25255.255.255.010.xxx.xx.255STATIC false

vmk31.1.1.1255.255.255.01.1.1.255STATIC false

vmk410.xxx.xx.23255.255.255.010.xxx.xx.255STATIC false

The interface vmk0 (or its IPv4 address rather) is what I wanted to migrate. The interface vmk4 is what I created temporarily.

I now removed the IPv4 address of the existing vmk NIC and assigned that to the new one. Also, confirmed the changes just to be sure. As soon as I did so vCenter picked up the changes. I then tried to move the remaining physical NIC over to the distributed switch, but that failed. Gave an error that the existing connection was forcibly closed by the host. So I rebooted the host. Post-reboot I found that the host now thought it had no IP, even though it was responding to the old IP via the new vmk. So this approach was a no-go (but still leaving it here as a reminder to myself that this does not work).

I now migrated vmk0 from the standard switch to the distributed switch. As before, this will fail – vCenter will lose connectivity to the ESX host. But that’s why I have a console open. As expected the output of esxcli network ip interface list shows me that vmk0 hasn’t moved to the distributed switch:

So now I go ahead and remove the IPv4 address of vmk0 and assign that to vmk4 (the new one). Also confirmed the changes.

Finally continuing with my NSX adventures … some two weeks have past since my last post. During this time I moved everything from VMware Workstation to ESXi.

Initially I tried doing a lift and shift from Workstation to ESXi. Actually, initially I went with ESXi 6.5 and that kept crashing. Then I learnt it’s because I was using the HPE customized version of ESXi 6.5 and since the server model I was using isn’t supported by ESXi 6.5 it has a tendency to PSOD. But strangely the non-HPE customized version has no issues. But after trying the HPE version and failing a couple of times, I gave up and went to ESXi 5.5. Set it up, tried exporting from VMware Workstation to ESXi 5.5, and that failed as the VM hardware level on Workstation was newer than ESXi.

Not an issue – I fired up VMware Converter and converted each VM from Workstation to ESXi.

Then I thought hmm, maybe the MAC addresses will change and that will cause an issue, so I SSH’ed into the ESXi host and manually changed the MAC addresses of all my VMs to whatever it was in Workstation. Also changed the adapters to VMXNet3 wherever it wasn’t. Reloaded the VMs in ESXi, created all the networks (portgroups) etc, hooked up the VMs to these, and fired them up. That failed coz the MAC address ranges were of VMware Workstation and ESXi refuses to work with those! *grr* Not a problem – change the config files again to add a parameter asking ESXi to ignore this MAC address problem – and finally it all loaded.

But all my Windows VMs had their adapters reset to a default state. Not sure why – maybe the drivers are different? I don’t know. I had to reconfigure all of them again. Then I turned to OpnSense – that too had reset all its network settings, so I had to configure those too – and finally to nested ESXi hosts. For whatever reason none of them were reachable; and worse, my vCenter VM was just a pain in the a$$. The web client kept throwing some errors and simply refused to open.

That was the final straw. So in frustration I deleted it all and decided to give up.

But then …

I decided to start afresh.

Installed ESXi 6.5 (the VMware version, non-HPE) on the host. Created a bunch of nested ESXi VMs in that from scratch. Added a Windows Server 2012R2 as the shared iSCSI storage and router. Created all the switches and port groups etc, hooked them up. Ran into some funny business with the Windows Firewall (I wanted to assign some interface as Private, others as Public, and enable firewall only only the Public ones – but after each reboot Windows kept resetting this). So I added OpnSense into the mix as my DMZ firewall.

So essentially you have my ESXi host -> which hooks into an internal vSwitch portgroup that has the OpnSense VM -> which hooks into another vSwitch portgroup where my Server 2012R2 is connected to, and that in turn connects to another vSwitch portgroup (a couple of them actually) where my ESXi hosts are connected to (need a couple of portgroup as my ESXi hosts have to be in separate L3 networks so I can actually see a benefit of VXLANs). OpnSense provides NAT and firewalling so none of my VMs are exposed from the outside network, yet they can connect to the outside network if needed. (I really love OpnSense by the way! An amazing product).

Then I got to the task of setting these all up. Create the clusters, shared storage, DVS networks, install my OpenBSD VMs inside these nested EXSi hosts. Then install NSX Manager, deploy controllers, configure the ESXi hosts for NSX, setup VXLANs, segment IDs, transport zones, and finally create the Logical Switches! :) I was pissed off initially at having to do all this again, but on the whole it was good as I am now comfortable setting these up. Practice makes perfect, and doing this all again was like revision. Ran into problems at each step – small niggles, but it was frustrating. Along the way I found that my (virtual) network still does not seem to support large MTU sizes – but then I realized it’s coz my Server 2012R2 VM (which is the router) wasn’t setup with the large MTU size. Changed that, and that took care of the MTU issue. Now both Web UI and CLI tests for VXLAN succeed. Finally!

Third time lucky hopefully. Above are my two OpenBSD VMs on the same VXLAN, able to ping each other. They are actually on separate L3 ESXi hosts so without NSX they won’t be able to see each other.

Not sure why there are duplicate packets being received.

Next I went ahead and set up a DLR so there’s communicate between VXLANs.

Yeah baby! :o)

Finally I spent some time setting up an ESG and got these OpenBSD VMs talking to my external network (and vice versa).

The two command prompt windows are my Server 2012R2 on the LAN. It is able to ping the OpenBSD VMs and vice versa. This took a bit more time – not on the NSX side – as I forgot to add the routing info on the ESG for my two internal networks (192.168.1.0/24 and 192.168.2.0/24) as well on the Server 2012R2 (192.168.0.0/16). Once I did that routing worked as above.

I am aware this is more of a screenshots plus talking post rather than any techie details, but I wanted to post this here as a record for myself. I finally got this working! Yay! Now to read the docs and see what I missed out and what I can customize. Time to break some stuff finally (intentionally).

In my previous post I said the following (in gray). Here I’d like to add on:

A VDS uses VMKernel ports (vmk ports) to carry out the actual traffic. These are virtual ports bound to the physical NICs on an ESXi host, and there can be multiple vmk ports per VDS for various tasks (vMotion, FT, etc). Similar to this we need to create a new vmk port for the host to connect into the VTEP used by the VXLAN.

Unlike regular vmk ports though we don’t create and assign IP addresses manually. Instead we either use DHCP or create an IP pool when configuring the VXLAN for a cluster. (It is possible to specify a static IP either via DHCP reservation or as mentioned in the install guide).

The number of vmk ports (and hence IP addresses) corresponds to the number of uplinks. So a host with 2 uplinks will have two VTEP vmk ports, hence two IP addresses taken from the pool. Bear that in mind when creating the pool.

Each cluster uses one VDS for its VXLAN traffic. This can be a pre-existing VDS – there’s nothing special about it just that you point to it when enabling VXLAN on a cluster; and the vmk port is created on this VDS. NSX automatically creates another portgroup, which is where the vmk port is assigned to.

VXLANs are created on this VDS – they are basically portgroups in the VDS. Each VXLAN has an ID – the VXLAN Network Identifier (VNI) – which NSX refers to as segment IDs.

Before creating VXLANS we have to allocate a pool of segment IDs (the VNIs) taking into account any VNIs that may already be in use in the environment.

The web UI only allows us to configure a single segment ID range, but multiple ranges can be configured via the NSX API.

Logical Switch == VXLAN -> which has an ID (called segment ID or VNI) == Portgroup. All of this is in a VDS.

While installing NSX I came across “Transport Zones”.

Remember ESXi hosts are part of a VDS. VXLANs are created on a VDS. Each VXLAN is a portgroup on this VDS. However, not all hosts need be part of the same VXLANs, but since all hosts are part of the same VDS and hence have visibility to all the VXLANs we need same way of marking which hosts are part of a VXLAN. We also need some place to identify if a VXLAN is in unicast, multicast, or hybrid mode. This is where Transport Zones come in.

If all your VXLANs are going to behave the same way (multicast etc) and have the same hosts, then you just need one transport zone. Else you would create separate zones based on your requirement. (That said, when you create a Logical Switch/ VXLAN you have an option to specify the control plane mode (multicast mode etc). Am guessing that overrides the zone setting, so you don’t need to create separate zones just to specify different modes).

Note: I keep saying hosts above (last two paragraphs) but that’s not correct. It’s actually clusters. I keep forgetting, so thought I should note it separately here rather the correct my mistake above. 1) VXLANs are configured on clusters, not hosts. 2) All hosts within a cluster must be connected to a common VDS (at least one common VDS, for VXLAN purposes). 3) NSX Controllers are optional and can be skipped if you are using multicast replication? 4) Transport Zones are made up of clusters (i.e. all hosts in a cluster; you cannot pick & choose just some hosts – this makes sense when you think that a cluster is for HA and DRS so naturally you wouldn’t want to exclude some hosts from where a VM can vMotion to as this would make things difficult).

Worth keeping in mind: 1) A cluster can belong to multiple transport zones. 2) A logical switch can belong to only one transport zone. 3) A VM cannot be connected to logical switches in different transport zones. 4) A DLR (Distributed Logical Router) cannot connect to logical switches in multiple transport zones. Ditto for an ESG (Edge Services Gateway).

After creating a transport zone, we can create a Logical Switch. This assigns a segment ID from the pool automatically and this (finally!!) is your VXLAN. Each logical switch creates yet another portgroup. Once you create a logical switch you can assign VMs to it – that basically changes their port group to the one created by the logical switch. Now your VMs will have connectivity to each other even if they are on hosts in separate L3 networks.

Something I hadn’t realized: 1) Logical Switches are created on Transport Zones. 2) Transport Zones are made up of / can span clusters. 3) Within a cluster the logical switches (VXLANs) are created on the VDS that’s common to the cluster. 4) What I hadn’t realized was this: no where in the previous statements is it implied that transport zones are limited to a single VDS. So if a transport zone is made up of multiple clusters, each / some of which have their own common VDS, any logical switch I create will be created on all these VDSes.

Sadly, I don’t feel like saying yay at the this point unlike before. I am too tired. :(

Which also brings me to the question of how I got this working with VMware Workstation.

By default VMware Workstation emulates an e1000 NIC in the VMs and this doesn’t support an MTU larger than 1500 bytes. We can edit the .VMX file of a VM and replace “e1000” with “vmxnet3” to replace the emulated Intel 82545EM Gigabit Etherne NIC with a paravirtual VMXNET3 NIC to the VMs. This NIC supports an MTU larger than 1500 bytes and VXLAN will begin working. One thing though: a quick way of testing if the VTEP VMkernel NICs are able to talk to each other with a larger MTU is via a command such as ping ++netstack=vxlan -I vmk3 -d -s 1600 xxx.xxx.xxx.xxx. If you do this once you add a VMXNET3 NIC though, it crashes the ESXi host. I don’t know why. It only crashes when using the VXLAN network stack; the same command with any other VMkernel NIC works fine (so I know the MTU part is ok). Also, when testing the Logical Switch connectivity via the Web UI (see example here) there’s no crash with a VXLAN standard test packet – maybe that doesn’t use the VXLAN network stack? I spent a fair bit of time chasing after the ping ++netstack command until I realized that even though it was crashing my host the VXLAN was actually working!

Before I conclude a hat-tip to this post for the Web UI test method and also for generally posting how the author set up his NSX test lab. That’s an example of how to post something like this properly, instead of the stream of thoughts my few posts have been. :)

Update: Short lived happiness. Next step was to create an Edge Services Gateway (ESG) and there I bumped into the MTU issues. And this time when I ran hte test via the Web UI it failed and crashed the hosts. Disappointed, I decided it was time to move on from VMware Workstation. :-/

I decided to take a break from my NSX reading and just go ahead and set up a VXLAN in my test lab. Just go with a hunch of what I think the options should be based on what the menus ask me and what I have read so far. Take a leap! :)

*Ahem* The above is actually incorrect, and I am an idiot. A super huge idiot! Each VM is actually just pinging itself and not the other. Unbelievable! And to think that I got all excited thinking I managed to do something without reading the docs etc. The steps below are incomplete. I should just delete this post, but I wrote this much and had a moment of excitement that day … so am just leaving it as it is with this note.

Above we have two OpenBSD VMs running in my nested EXIi hypervisors.

obsd-01 is running on host 1, which is on network 10.10.3.0/24.

obsd-02 is running on host 2, which is on network 10.10.4.0/24.

Note that each host is on a separate L3 network.

Each host is in a cluster of its own (doesn’t matter but just mentioning) and they connect to the same VDS.

In that VDS there’s a port group for VMs and that’s where obsd-01 and obsd-02 connect to.

Without NSX, since the hosts are on separate networks, the two VMs wouldn’t be able to see each other.

With NSX, I am able to create a VXLAN network on the VDS such that both VMs are now on the same network.

I put the VMs on a 192.168.0.0/24 network so that’s my overlay network.

VXLANs are basically port groups within your NSX enhanced VDS. The same way you don’t specify IP/ network information on the VMware side when creating a regular portgroup, you don’t do anything when creating the VXLAN portgroup either. All that is within the VMs on the portgroup.

A VDS uses VMKernel ports (vmk ports) to carry out the actual traffic. These are virtual ports bound to the physical NICs on an ESXi host, and there can be multiple vmk ports per VDS for various tasks (vMotion, FT, etc). Similar to this we need to create a new vmk port for the host to connect into the VTEP used by the VXLAN.

Unlike regular vmk ports though we don’t create and assign IP addresses manually. Instead we either use DHCP or create an IP pool when configuring the VXLAN for a cluster. (It is possible to specify a static IP either via DHCP reservation or as mentioned in the install guide).

Each cluster uses one VDS for its VXLAN traffic. This can be a pre-existing VDS – there’s nothing special about it just that you point to it when enabling VXLAN on a cluster; and the vmk port is created on this VDS. NSX automatically creates another portgroup, which is where the vmk port is assigned to.

And that’s where I am so far. After doing this I went through the chapter for configuring VXLAN in the install guide and I was pretty much on the right track. Take a look at that chapter for more screenshots and info.

Yay, my first VXLAN! :o)

p.s. I went ahead with OpenBSD in my nested environment coz (a) I like OpenBSD (though I have never got to play around much with it); (b) it has a simple & fast install process and I am familiar with it; (c) the ISO file is small, so doesn’t take much space in my ISO library; (d) OpenBSD comes with VMware tools as part of the kernel, so nothing additional to install; (e) I so love that it still has a simple rc based system and none of that systemd stuff that newer Linux distributions have (not that there’s anything wrong with systemd just that I am unfamiliar with it and rc is way simpler for my needs); (f) the base install has manpages for all the commands unlike minimal Linux ISOs that usually seem to skip these; (g) take a look at this memory usage! :o)

p.p.s. Remember to disable the PF firewall via pfctl -d.

Yay again! :o)

Update: Short-lived excitement sadly. A while later the VMs stopped communicating. Turns out VMware Workstation doesn’t support MTU larger than 1500 bytes, and VXLAN requires 1600 byte. So the VTEP interfaces of both ESXi hosts are unable to talk to each other. Bummer!

Update 2: I finally got this working. Turns out I had missed some stuff; and also I had to make some changes to allows VMware Workstation to with larger MTU sizes. I’ll blog this in a later post.

This is a very basic post. Was trying to read up on NSX and before I could appreciate it I wanted to go down and explore how things are without NSX so I can better understand what NSX is trying to do. I wanted to put it down in writing as I spent some time on it, but there’s nothing new or grand here.

So. vSphere Distributed Switches (VDS). These are Layer 2 switches that exist on each ESX host and which contain port groups that you can connect VMs running on a host onto. In case it wasn’t obvious from the name “switch”, these are Layer 2. Which means that all the hosts connecting to a particular Distributed Switch must be on the same Layer 2. Once you create a Distributed Switch and add ESXi hosts and their physical NICs to it, you can create VMKernel ports for Management, vMotion, Fault Tolerance, etc but these VMKernel ports aren’t used by the port groups you create on the Distributed Switch. The port groups are just like Layer 2 switches – they communicate via broadcasting using the underlying physical NICs that are assigned to the Distributed Switch; but since there’s no IP address as such assigned to a port group there’s no routing involved. (This is an obvious point but I keep forgetting it).

For example say you have two ESX hosts – HostA and HostB – and these are on two separate physical networks (i.e. separated by a router). You create a new Distributed Switch comprising of a physical NIC each from each host. Then you make a port group on this switch and put VM-A on HostA and VM-B on HostB. When creating the Distributed Switch and adding physical NICs to it, VMware doesn’t care if the physical NICs aren’t in the same Layer 2 domain. It will happily add the NICs but when you try to send traffic from VM-A to VM-B it will fail. That’s because when VM-A tries to communicate with VM-B (let’s assume these two VMs know each others MAC address so there’s no need for ARP communication first), VM-A will send Ethernet frames to the Distributed Switch on HostA who will then broadcast it to the Layer 2 network its physical NIC assigned to the Distributed Switch is connected to. Since these broadcasted frames won’t reach the physical NIC of HostB the VM-B there never sees it, and so the two VMs cannot communicate with each other.

So – keep in mind that all physical NICs connecting to the same Distributed Switch must be on the same Layer 2. If the underlying physical NICs are on separate Layer 3 networks, and these Layer 3 networks have connectivity to each other, it doesn’t matter – the VMs in the port groups will not be able to communicate.

And this is where NSX comes in. Using the concept of VXLANs, NSX stretches a Layer 2 network across Layer 3. Basically it encapsulates the Layer 2 traffic within Layer 3 packets and gives the illusion of all VMs being on the same Layer 2 network – but this illusion is what Network Virtualization if all about, right? :) VXLAN is an overlay.

VXLAN encapsulates Layer 2 frames in UDP packets. The VXLAN is like a tunnel to which all the hosts connecting to this VXLAN hook into. On each host there’s something called a Virtual Tunnel End Point (VTEP) which is the “thing” that actually hooks into the VXLAN. If a VXLAN is a Distributed Switch made up of physical NICs from the host, the VTEP is the VMKernel ports of this Distributed Switch that do the actual communication (like how vMotion traffic between two hosts happens via the VMKernel ports you assign for vMotion). In fact, during an NSX install you install three VIBs on the ESXi hosts – one of these enhances the existing Distributed Switch with VXLAN capabilities (the encapsulation stuff I mentioned above).

Once you have NSX you can create multiple Logical Switches. These are basically VXLAN switches that operate like Layer 2 switches but can actually stretch multiple Layer 3 networks. Logical Switches are overlay switches. ;o) Each Logical Switch corresponds to one VXLAN.

ps. VXLAN is one of the cool features of NSX. The other cool features are the Distributed Logical Router (DLR) and the Distributed Firewall (DFW). I mentioned that a ESXi host has 3 VIBs installed as part of NSX, and that one of them is VXLAN functionality? Well the other two are DLR and DFW (god, so many acronyms!). Prior to DLR if an ESXi host had two VMs connected to different Distributed Switches, and if these two hosts wanted to talk to each other, the traffic would go down from one of the VMs, to the host, to the underlying physical network router, and back to the host and up to the VM on the other Distributed Switch. But with DLR, the ESXi hypervisor kernel can do Layer 3 routing too, so it will simply send traffic directly to the VM in the other Distributed Switch.

Similarly, DFW just means each ESXi hypervisor can also apply firewall rules to the packets, so you don’t need one centralized firewall place any more. You simply create rules and push it out to the ESXi hosts and they can do firewalling between VMs. Everything is virtual! :)

pps. Some other jargon. East-West traffic means network traffic that’s usually within or between servers (ESXi hosts in our case). North-South traffic means any other network traffic – basically, traffic that goes out of this layer of ESXi hosts. With NSX you try and have more traffic East-West rather than North-South.