Building a Nested ESXi Lab For Your Company

Let’s face it, most companies don’t have Labs for you to learn new software and operating systems, let alone test upgrades and changes to your existing environments. If you do have some sort of lab, it’s probably hand me down gear that’s in shambles with so many problems you can’t remotely attempt to mimic anything in production. If you do have a nice lab, you’re certainly in the minority – hug it and squeeze it and pet it and call it George! Never let go.

The problem with labs is that companies typically don’t see the value. Us nerds just want extra servers to play around with, I know it! Managers don’t see the ROI and don’t want to make room in the budget because they think it will cost a fortune. I’ve spent the bulk of my career building labs out of old equipment and hustling for extra RAM and storage. The struggle is real. But, you don’t need to match production, you just need something relatively modern, and then take full advantage of that sweet sweet magic called virtualization and build a nested lab. Yep, ESXi on top of ESXi – let’s virtualize your virtualization environment! You can get away with a couple of blades in an existing chassis, or a couple of older rack mount servers – maybe it’s a little older, maybe it’s not quite as powerful, the trick is getting hardware that isn’t ancient. The easiest business case to make is one that is virtually free, or severely discounted. Yeah, I can hear you now “but the security team…the network team…corporate policy…blah blah blah”. Yeah, I know, I’m not saying you should go all shadow IT. There are work arounds, and you might not make some new friends, but hey, this lab is important!

So why am I writing about this? It’s not a secret that every IT person struggles with this topic. I work on a delivery team where we design, build and implement solutions for customers – simple or complex, it doesn’t matter. So, when something new drops, like the next release of vSphere for example, our team needs to be on top of it. I wanted our team to be able to jump into the lab and deploy a nested environment to test upgrades, or deploy the newest version of product X, etc. This is what ever IT professional needs. Let’s dive into the physical setup.

Physical Hardware

Servers

I touched on this briefly above, we need some modern hardware. It doesn’t have to be anything earth shattering or beefy. We just need some resources. I managed to find two blades that were shut down and not in use in a UCS chassis. YES! Two dual socket, 8 core, B200 M3s with 96GB of RAM each. Nothing crazy, it will get the job done. You can get these servers (or something similar) refurbished all over the internet at a substantial discount.

Storage

I can set these up these two blades to boot from SD or SAN, and since the chassis is already connected, I’ll boot from SAN. However, if I didn’t have connectivity, I would have gone with the extremely cheap alternative of booting from SD.

I was able to get the storage guys to give me a couple of 10GB boot LUNs for my hosts, so we’re in business there. This environment also has NFS storage available. NFS = thin provisioned = cheap. The storage guys were generous here too and carved out some storage for me. Now, if you aren’t lucky enough to have extra storage available, there are options. You could acquire a third server and setup FreeNAS, or even a Windows server as a NFS or iSCSI target. You can even use vSAN if you can manage to meet the HCL requirements. The goal of the lab is to get creative and use what’s available to you.

Network

Network is always tricky. Those guys horde IP addresses, subnets and VLANs like you’re stealing from them personally. Here I was able to get a new subnet (/24) that was all mine. Luckily I have super cool teammates, but if you aren’t so lucky you’ll need to beg, or bribe them with their favorite adult beverage.

Now it’s just a matter of putting it all together. Get the new VLAN added to the port channels, create the service profiles in UCS for my two blades, get the boot LUNs zoned, and done! I have a two node cluster ready to go.

Building the Physical vSphere Lab

I’m going to assume you have a fundamental understanding of vSphere and ESXi, but there’s a couple of extra things we’ll want to do. So, after you install ESXi on each host (use the appropriate vendor ISO) you’ll want to install any extra VIBs necessary for your environment (storage, network, etc). Since we’re using Tintri for our NFS storage, I installed the Tintri VAAI VIBs so I could enable Storage I/O Control and Hardware Acceleration appropriately, this will be instrumental in lighting fast clones among other things.

Once you have your hosts ready with ESXi and the appropriate drivers, we’ll want to install the ESXi Mac Learning dvFilter since we plan on doing nested ESXi VMs. This is a very important requirement, don’t skip it, I’m putting it here now, and will explain in a minute. I didn’t do this and all my networking broke down.

Now that we have our hosts prepped and ready, you can make the necessary tweaks – add storage, configure your vSwitch(es), NTP, security profile, etc and finally move on to setting up vCenter for your new lab cluster. The simplest route here is just deploying a vCSA with embedded PSC. A few things I did to the vCenter after it was deployed:

Enabled HA and disabled Admission Control: I don’t need HA in the lab, but I setup vCenter with a high restart priority. We want that coming back online in a hurry.

Enable DRS

Enable vMotion, I created an additional vmkernel port for this, but it’s not necessary.

Migrate VSS to VDS for ease of management in the future.

What we have here is a failure to communicate!

Remember when I said I didn’t do the above steps first and my networking broke? Turns out networking in a nested environment can be a bit problematic. Once I deployed the labs and started doing a bit of testing I started seeing a lot of network weirdness, dropped packets, etc. It took me a hot minute (ok, maybe a night of drinking at the bar with other nerds and laptops) to sort this out. Here is where your portgroup security comes in and why I created a virtual portgroup specifically for our nested ESXi hosts (oh yeah, create a new portgroup)!

The reason for the special portgroup is because we need to set Promiscuous Mode and Forged Transmits to ACCEPT to ensure proper network connectivity for the nested ESXi hosts. This also works with our dvfilter we installed previously. The filter allows ESX to support MAC learning on your vSwitch ports so the host won’t drop the packets it sees coming from multiple MAC addresses.

Promiscuous Mode: Think destination. The packets going to guest VMs inside the nested ESXi hosts will have different MAC addresses than the nested ESXi hosts (they’ll have MACs for the physical hosts). So, with this enabled, it allows the nested ESXi hosts to monitor all of the traffic and allow the packets to reach their destination (the guest VMs inside the nested ESXi VMs). Additionally, the MAC learning dvfilter requires this to function.

MAC Address Changes: This one you don’t necessarily need to have enabled, and in most environments I have it disabled (along with the other two security options). A VMs MAC address is automatically assigned and specified in the VMX file (or VM settings), however the guest can change it’s MAC in the OS. If this is done, the VM is spoofing it’s MAC. With this security option disabled, the frames will be dropped and the port blocked. With this option enabled, we let the traffic pass despite the MAC address discrepancy. I left it enabled here because we’re doing a lot of funky stuff and I wanted to absolutely ensure that my traffic has the freedom to do what it pleases, because ‘Murica.

Forged Transmits: Think source. The packets coming from our guest VMs inside the nested ESXi hosts will have their own MACs, while the physical ESXi hosts will be expecting MACs from the nested ESXi hosts. Since the host sees a discrepancy here, it will drop the packets. Enabling this allows you to receive network traffic that seems to come from a MAC address other than the one the host is anticipating.

I also changed the port binding to Ephemeral so we can deploy additional templated vCSAs to the same network. You can do this with VSS, and you won’t need to change the port binding, this is only for VDS, which is my preference here.

So, at this point we have a two node ESXi cluster setup with vCenter configured for HA, vMotion, shared storage, and network prepped for nested virtualization. We are officially ready to start building VMs and testing things in your lab.

Building the Nested vSphere Lab

Here’s where this lab will shine. I wanted a way for folks to be able to deploy a nested ESXi lab, complete with vCenter. This serves multiple purposes:

Those who are just diving into VMware can have a full lab to play with and learn in.

Folks who just need a quick lab to test out a feature or config change can do so without worry.

Admittedly, the nested lab ended up being a little beefier than I intended, but it worked out for the best, even if I had to beg for additional servers. This lab is comprised of the following components:

I opted to use external versus embedded PSC so that folks could test re-pointing the vCenter or adding additional PSCs for various experiments.

Three ESXi Servers: ESXi 6.0 U2, 2 vCPU, 4GB RAM, 2GB HDD

I’m going to assume that everyone is familiar with the basic installation Windows, ESXi, the PSC and vCenter. However, if you need help with the PSC/vCenter install, check out VMware’s feature walkthrough.

Building the Windows Servers is easy enough – create your VMs, run through your standard Windows installation and install windows updates, VMware Tools, and the extra tools and utilities you require. For the AD-DNS Server, install Active Directory, DNS and Storage/NFS Features and Roles, then create forward and reverse DNS records for your 7 virtual machines. I also created a CNAME for NFS to point back to the AD Server just for fun. Lastly, format your secondary HDD on the AD-DNS box and create your NFS or iSCSI target. Speaking of storage, let’s quickly setup NFS on the AD Server.

Add the “File and Storage Services” server role and be sure that “Server for NFS” is checked within the role.

Once you’ve added the role, go ahead and create an NFS share. I added a 2TB (thin provisioned!) Z: drive to my ad-dns VM, on which to place this share. I called the share datastore01.

Once the Windows VMs are good to go, and you verify DNS works, now we can move on the vSphere components.

Choose the correct version of ESXi for the Guest OS Family (ignore the warning about it being unsupported).

Place all three vNICs on the special portgroup you created.

In the Customize Hardware section, expand CPU, and check the box for “Hardware virtualization” so that it can run a hypervisor.

Remember earlier, we installed the dvfilter on the phsycial hosts? For this to work we need to specify some advanced VM settings. Edit the settings of the VM, click VM Options and Configuration Parameters. Add the following lines for your NICs (we add two settings for each vNIC):

Now, power up your VM and install ESXi. Once installed, perform the following:

Under Troubleshooting Options, enable ESXi Shell and SSH

Disable IPv6

Save changes and reboot

When the VM boots back up, press ALT+F1, and login to the ESXi shell and perform the following:

Apply an advanced configuration option to force the management vmkernel interface (vmk0) MAC to always follow the hardware MAC. The reason we do this is because when we convert this to a template and deploy new VMs, but the MAC addresses will change on the NICs. Well, vmk0 usually mirrors this and we’ll get some duplicate MACs and other goofiness. This setting will tell the ESXi host to force vmk0 to use the new MAC address when the VM boots after we deploy it.

Run the command: esxcli system settings advanced set -o /Net/FollowHardwareMac -i 1

Remove the System UUID so it will auto generate on boot. This is just a safeguard to ensure that the ESXi host is truly unique, including it’s system UUID. It will be autogenerated when the VM boots up after deployment.

edit /etc/vmware/esx.conf and remove the line that starts with “/system/uuid” and save the changes.

Ensure the new config changes persist by running /sbin/auto-backup.sh

Exit the shell and shutdown the ESXi host. Once shutdown, convert the VM to a template. You know have a fresh re-usable ESXi template.

From the template, deploy three new ESXi VMs. Once deployed, configure their hostnames and IP information accordingly. Next we can deploy the PSC appliance, and then the vCSA. Go ahead and deploy these, then login to vCenter. Once you are logged into vCenter, you can create a Datacenter, Cluster and add your three ESXi hosts.

We’re almost done! Go ahead and mount the shared storage from your Windows VM, and configure your vMotion ports. In my lab, I created a secondary vSwitch for it, but you can piggy back the management interface if you desire. I also used VSS in they nested lab, and have a third NIC unused for playing around with VDS migrations and other options. Here’s how my virtual networking turned out:

Now that the networking is situated, let’s configure the cluster settings. I enabled DRS and HA but specified some advanced settings to clear up the alerts and errors. After all, this is a nested lab, we just want to be able to test features and play around with the good stuff!

HA Advanced Options are as follows:

We only have one datastore, but HA wants two:

ignoreInsufficientHbDatastore=true

Our network isn’t redundant:

ignoreRedundantNetWarning=true

We don’t want to use the default gateway for an isolation address; we want to use the AD-DNS VM:

usedefaultisolationaddress=false

isolationaddress0=192.168.254.12

Nested ESXi Advanced Host Settings to get rid of those pesky warnings on the host:

Supress the SSH shell warning

SuppressShellWarning=1

Get rid of the syslog error:

global.logDir=[Datastore01]/syslog

You now have a nested ESXi Lab Cluster!

Templatizing the Nested vSphere Lab

Now, before we get carried away and start playing around, let’s turn this entire lab into a template so we can deploy it over and over. Why, you ask? Well if you break it, you can just redeploy it. Or you can have multiple labs spun up at the same time (more on this sexiness later). Log out of your nested environment, and then gracefully shutdown your VMs from you Physical Lab vCenter in the following order:

ESXi-03

ESXi-02

ESXi-01

vCSA

PSC

AD-DNS

Jumpbox

Once shutdown, right-click each VM, and convert to template. Then save them in a specific folder for later.

Now you have a fresh set of templates at your disposal. This should get anyone up and running with a nested lab, and while it’s not hard process, it is slightly time consuming. Once you’re at this point, however, you have a lot more options available to you such as testing other products such as SRM, NSX, vRA, etc. The cool thing about creating this lab template is now you can deploy a lab, mess with it, break it whatever… then destroy it and start fresh if you want with minimal effort.

This wasn’t my end game. In fact, I did this the hard way. If this was my goal because there’s much easier, automated, ways to accomplish this. Don’t get me wrong, it’s still solid, fun and was quite the learning experience. But I hinted at using other products like NSX and vRA, and this my internet friends, this is where the real fun and magic happens! I had a bigger goal in mind here: to make on-demand labs with a self service catalog! Basically like VMware’s hands on labs (HOL), but internally for us with specific purposes in mind. So, how do we accomplish this? Well this will require both NSX and vRA. NSX will allow us to take advantage of VXLAN so we can do things like overlap IPs so every lab is identical, but won’t conflict with each other. vRA will let us create blueprints so we can automatically provision the VMs, networks and settings so we can rapidly deploy labs.

If you don’t care about rapidly deploying multiple labs, that’s totally cool, you still have a sweet nested environment. But if you want to take this a step further, go ahead and get started installing NSX and vRA. I’d walk you through that, but so many people have written step-by-step guides it doesn’t make sense to duplicate the work. Check these out:

Thanks for this write up. I am studying for my VCP and have a similar setup. Currently I am running just one bare metal esxi host. It’s an HP Proliant G6 with 2 E5645 (total 12 cores and 24 threads with Hypthreading) and 48gb of RAM. In it I have setup the following:

I plan on doing all my studying with this setup and I am hoping it will suffice. Only question during your write up, I noticed you said shut down all the VMs and copy vm to template. If shutdown my VCSA i can’t make templates, unless you have a separate VCSA running outside of the nested environment. Is that the case?

This was exactly what I was looking for. I created a nested esxi environment but had issue getting the nested esxi storage vmkernel ip to communicate with the storage vm. Enable Promiscuous Mode solved this issue. I also have issue getting nested esxi host to work. I try what you have in this article later. Great read. Excellent article!!!