Saturday, 14 January 2017

Recently I had an issue while upgrading the customer's environment from ESXi 5.5 to 6.

This was very sensitive vSAN cluster with numerous issues so I had to manually upgrade hosts.
One of the hosts failed during the upgrade process with an error "[Errno 28] No space left on device"

After some troubleshooting I found that the /locker/packages folder contained 5.5.0 and 6.0.0 packages folder so I moved both these folders to a shared datastore to cleanup up some space.

However, when I tried to run the upgrade for the second time the installer didn't provide Upgrade option. If you open the details of the disk where ESXi is installed, in my case SD card, you will see that the Installer cannot find ESXi there.

However, I could still boot ESXi host just fine.

Well, the thing is that ESXi contains two boot partisions with two symbolic links to boot partitions /bootbank and /altbootbank.

When ESXi is updated/upgraded the new files are actually writtent to the /altbootbank partition and then the symlinks are updated so that /altbootbank partition becomes /bootbank partition and vice versa.

That allows to rollback the ESXi update/upgrade if something goes wrong with the /bootbank.

In my case the /altbootbank wasn't fully updated due to the failed upgrade process and it didn't containg the state.tgz file which is actually a collection of configuration files. Some othere files were missing too and the sizes of two partitions differ significantly.

So, it looks like when the /altbootbank is corrupted and doesn't contain all files the Installer refuses to recognize installed ESXi.

Therefore, I deleted all files from the /altbootbank partition and copied the content of /bootbank over and on the next attempt I was suggested to upgrade the ESXi host.