Complex Nested Lab Running in vCloud Air

My colleague George Kobar recently wrote blog post about running Nested ESXi on vCloud Air. The gist of the article is the description how to solve networking issues around nested virtualization that usually result in the need to have promiscuous mode enabled on the virtual distributed switch which is not possible to do in public cloud environment. The trick is to use nested vMACs identical to those assigned by vCloud Air to virtual ESXi hosts.

I had the opportunity to test the practical viability of the approach when I needed to test quite complex architecture which involved NSX, multiple vSphere 6 racks with spine and leaf network architecture and vCloud Director version 8 (currently in public beta). If you read my previous article about vCloud Director and NSX you will recognize the topology.

I have 3 racks each with two ESXi hosts running in different subnets and all communication between racks is routed over the spine router. For nested storage I am using HP VSA which provides iSCSI storage and then there are some management components (Domain Controller, Windows based vCenter Server with MS SQL database, NSX Manager and vCloud Director cell). On the virtual ESXi hosts are running nested VMs – NSX Controller, Edge Gateways, DLR control VM and some linux test VMs.

Here is how I set the whole nested lab:

One vCloud Air Edge Gateway is simulating the spine router. It has 8 7 Org VDC networks attached. For each rack there is ESXi management network (used for ESXi management, vMotion and iSCSI traffic) and transport network used for VXLAN fabric. Then there is one management network for management workloads and one for Internet access which is also routed to Internet to allow out of band RDP access to vCenter Server which acts as jump host so I do not need to use vCloud Air console.

Another vCloud Air Edge Gateway is used to simulate WAN router with external access for virtual workloads with Internet Org VDC network. As my vCloud Air VDC is in Advanced Networking Services beta I could even leverage OSPF peering between the Edge and my nested Edges.

Edge rack virtual ESXi hosts have plenty of NICs in order to have enough vMACs that can be used by nested objects (VMkernel ports and VMs). I used the following strategy:

One NIC is used for ESXi management. ESXi uses the ‘physical’ MAC address for vmk0 so no worries and management/vMotion and iSCSI networking works out of the box (and if not check this KB). Standard Switch is used.

Second NIC is used for VXLAN VTEP. When you prepare VXLAN fabric on nested hosts random MACs are assigned to VTEP vmk1. These need to be changed after the preparation so they are identical to MACs assigned to virtual ESXi hosts. This is done by editing /etc/vmware/esx.conf on each host and rebooting. Do VXLAN pings to check if your fabric is set up properly. Any VM connected to VXLAN network will have its traffic encapsulated by VTEPs and thus properly work in the nested environment.

ECMP Edges have uplink interface connected to VLAN network. I used the 3rd ESXi NIC on each Edge Rack host for one Edge. When you deploy the Edge via NSX Manager you specify the corresponding MAC of the virtual ESXi host. As the Edge cannot be vMotioned (its MAC wouldn’t match ESXi host MAC) I deployed it to local storage of the virtual ESXi host and the VLAN was presented on local standard switch. BTW tying ECMP Edges to particular host is recommended practice anyway.

The last challenge is the deployment of the NSX Controller (I deployed only one which is enough for testing needs). The NSX Controller needs to be deployed to regular non-VXLAN portgroup and it is not possible to set its MAC address as it is deployed by NSX Manager in automated fashion. So I deployed another Edge (called Controller router) on one of the Edge rack ESXi hosts again with uplink MAC set to MAC of the 4th NIC of the ESXi host. Then I created portgroup for the NSX Controller deployment and deployed the Controller. As long as the NSX Controller VM is on the same host as the Controller router (both deployed to local storage) the connectivity is local and then routed to the outside world with the Controller VM MAC.

Once the base of the nested lab is set up it can be used for many different things – vCloud Director or vRealize Automation testing or testing upgrades (I personally upgraded the lab from vSphere 5.5 to 6).

Can you detail the part below, I am trying to create a VAPP in vcloud which involves nsx and nested esxi hosts. I can do normal pings but cannot do vtep pings, then I stumbled on this blog and finally im seeing solutions now, but Im not really sure on how to proceed on this part.

“Second NIC is used for VXLAN VTEP. When you prepare VXLAN fabric on nested hosts random MACs are assigned to VTEP vmk1. These need to be changed after the preparation so they are identical to MACs assigned to virtual ESXi hosts. This is done by editing /etc/vmware/esx.conf on each host and rebooting”