VMware NIC Renumeration

I ran a bios/firmware update on my ESX host and now none of the networking is working! This happened to me on our Sun X4600M2 servers. I don’t know if it was something in these particular updates, something with these servers, something with Sun servers, etc. It happened and here’s how I fixed it. Note: this was all done on ESX 3.5 Update 2.

What happened and how do I know?

Apparently something in a bios/firmware update can, on the next boot, cause ESX to think that the network adapters are new so it gives them all new vmnicn names. In other research on this, it sounds like this can also happen when you add additional network adapters to an ESX host. Inside of ESX, the virtual switches that connect all of the networking up (including the service console and vmkernal ports) to the outside world, are configured based off of the vmnic names. Now that all of the nics have been renamed, the vswitches are routing packets to interfaces that don’t exist any more.

The way to tell that this is the problem is to log in as root (at the physical console unless you got lucky and this didn’t affect the vmnic that your service console port was using). Running /usr/sbin/esxcfg-nics -l will list out the vmnic information including their names. Then you can run /usr/sbin/esxcfg-vswitch -l to list out your vswitch configuration. The key to note here is what’s listed under “Uplinks” and see if you still have those vmnics that were in your vmnic list. If not, then you’ve run into this problem.

How to fix it

Through the use of the /usr/sbin/esxcfg-vswitch command, you could just adjust your vswitches to use the new vmnic names and solve your problem that way. For me, I prefer to keep all of my ESX hosts as similar as possible and wanted to get the vmnic names back to their originals. That proves to be a bit more complicated.

Either way you’re going to need a mapping of the new names back to the old names. In the case of the renumeration, hopefully they’re just shifted up numerically. I.E. if you had vmnic0-vmnic5 before, they’re now vmnic6-vmnic11. It’s also possible that this may have been done repeatedly, thus increasing the vmnic numbers up by a factor time the number of nics. In one instance, it seems like ESX renumerated them repeatedly so I had something up in the 30s, but it was still 6 numbered in order. In the case of adding in new nics, they may be in a new order now. ESX enumerates the nics based off of PCI address order so if the new nics were inserted in between the previous ones (in bus addressing) then some of the vmnic names may be correct and others not. You may have to resort to unplugging a network cord and then using /usr/sbin/esxcfg-nics -l to see which vmnic went down to figure out your mapping. Additionally, you can probably get this mapping out of the /etc/vmware/esx.conf file by comparing MAC addresses of the pnic child entries. See more information on that below.

ESX has a huge config file that covers all sorts of things from the hardware, vswitches, firewall rules, etc.

As a rule, VMware advises against editing this file. I used vi.

For those uninitiated in vi, nano is available on ESX as well. The file is at /etc/vmware/esx.conf and the usual precaution of making a copy before you edit should apply. We need to do two things in this file for each vmnic. We need to change the device vmkname back to the old name and we need to remove the duplicate entries in the pnic list. While these following changes can be done in one pass/edit, I broke them up heading-wise to help conceptually.

Fixing the device name

While editing the /etc/vmware/esx.conf perform a search for vmnic You’re looking for a line like /device/005:01.0/vmkname = “vmnic6” where vmnic6 is hopefully one of the new vmnic names. This is the area of the config file dealing the the device information and settings. You can see the device description and other information that may help you confirm that this is the correct nic. You’ll want to change the vmkname (that line) back to the old vmnic name. Repeat this for the other vnics that aren’t named properly. Now, if vmnic name here is the old name still, read the next section and then come back to here.

Fixing the pnic list

Once you’ve fixed the device names, if you continue searching down the file you’ll get to a section that looks something like this

Notice how there are 3 lines for each pnic child. They’re enumerated inside of the brackets. Here you should find a pnic child for all of the old names and the new names. Notice that the old name and new name will have the same mac value. Here, you’re going to want to delete the entire pnic/child entry (all 3 lines) for the new name. Repeat that for all of the misnamed vmnics.

Once you’ve fixed both the device names and the pnic list, reboot the box for the esx.conf file changes to take effect. /sbin/shutdown -r now You should be back in business now.

But the device lines in my esx.conf file have the correct (old) name

If your networking is busted as described above and this is the case, then your esx.conf file is mucked up and lying to you. This happened on one of our ESX hosts. I ended up calling VMware support and this is how we got it straightened out. They had me run these commands in this order. Whether they’re all needed or not (the man pages seems to imply that some of these switches implicitly force some of the others) I don’t know but I ran them and it worked. You can play around with cutting down the list if you want. Basically what it does is rebuild/update your esx.conf file.

/usr/sbin/esxcfg-boot -p

/usr/sbin/esxcfg-boot -b

/usr/sbin/esxcfg-boot -rg

Now reboot the host with /sbin/shutdown -r now and check your esx.conf file. The device lines should now have the new (still wrong) vmnic names and you can change them as instructed above. Don’t forget to make the pnic/child changes as well.

Further Notes

The bios/firmware patch may also affect the enumeration of your HBAs. However, they’re less likely to make your host unusable because of the name change. I haven’t (yet) tried correcting this. We’ll see how much it annoys me continuing forward.