vMotion operation fails with the error, “The VM failed to resume on the destination during early power on”.

I had an interesting issue to resolve today, one that I haven’t seen before and one that took a bit of digging to resolve. The problem related to migrating some Exchange mailbox servers from a legacy ESXi 4.1 host onto new ESXi 5.1 host.

This should have been a simple vMotion operation, but the task failed repeatedly at approximately 65% complete. I tried using both high and standard priority migrations, but it failed every time, simply reporting “The VM failed to resume on the destination during early power on”

First thing I did was check the host log files (vmkwarning and vmkernel), as well as the virtual machine log file (vmware.log) located in the virtual machine folder on the datastore;

So reading through the host log files it looks like there was a problem reserving enough memory resources on the destination host and the operation timed out. This sounds relatively plausible, but the exact same results were observed trying to migrate the VM onto an empty host.

Interestingly here, we now start getting some hints that perhaps a file lock is occurring and we also see the same error message that was observed in the vSphere client. The VM failed to resume on the destination during early power on.

I decided to have a look at the contents of the virtual machine folder, and found a number of suspicious looking “-ctk.vmdk” files, mostly time stamped from more than two years ago.

So for ESX/ESXi 3.x/4.x and ESXi 5.0, the lock status of these “-ctk.vmdk” files can be obtained using the vmkfstools command. The process and syntax is explained in detail in KB1003397, titled “Unable to perform operations on a virtual machine with a locked disk.”

## START ##
mode 0 = no lock
mode 1 = is an exclusive lock (vmx file of a powered on VM, the currently used disk (flat or delta), *vswp, etc.)
mode 2 = is a read-only lock (e.g. on the ..-flat.vmdk of a running VM with snapshots)
mode 3 = is a multi-writer lock (e.g. used for MSCS clusters disks or FT VMs).
## END ##

So at this stage I created a “tmp” directory in the virtual machine folder and moved all the “-ctk.vmdk” files here. Since this was a live, powered on VM, I felt more comfortable doing this with a GUI than using the shell and used WinSCP to transfer the files.

I then confirmed there were no longer any “-ctk.vmdk” files in the virtual machine folder, and that they were all in the newly created “tmp” folder;

I’m not sure where these “-ctk.vmdk” files came from, but suspect it may have originated from a legacy backup process from before my time. At least for now the issue is resolved and we know what to look for the next time this happens.

Credits;

Thanks to Jakob Fabritius Nørregaard for posting this blog article which helped identify and resolve this issue.

49,336 total views, 0 views today

Related

Comments

This issue has been resolved in ESXi 5.5 Build 2143827, I know you aren’t running that version, but another remedy I’ve come across is to create a snapshot, then delete the snapshot.. This consolidates the delta files and deletes the ctk files.

In some cases that also does not work, so the ultimate fix I’ve found is to clone the VM completely.