Friday, 8 July 2016

To check the consistency of the restore points you have backup verification jobs. These verification jobs can be either Automatic (ABV) or Manual backup verification. The backup verification flow on a high level basis goes as:

>> Restore: Restores the restore point as a temporary VM on the ESXi host and datastore which is defined on the backup verification job

>> Delete VM: Remove the temporary restored VM from the inventory and delete from disk.

The issue I am going to be discussing here is not a general issue, and this caused due to a very specific cause. However, the troubleshooting steps can be used and you might have similar causes due to which you will run into verification jobs to fail.

All the verification job logs are present under the following directory:

So here, the restore was done successfully and the network adapter is always disconnected for the verification VM to avoid IP conflict.

Then there was several tries done to Power On the virtual machine and all of them failed. Since the Power On was not completed the Power off failed as well.

The step to verify the heartbeat is excluded since the virtual machine was not powered On which led to the final state, delete the VM which was completed successfully.

That's pretty much it in the verification logs. This was not sufficient to find a cause, which led me to implement the next couple of tests:

1. For this verification job, I changed the destination host and datastore. Basically, I am doing the restore on a different host and a different datastore and it went through successfully. So something was either wrong with the host or the datastore.

2. So I changed the datastore location to the old path and the host still on the new one. The verification job completed successfully again. And when edited the job back to the old host, it failed with the same error.

So something is going on with this host! So we need to troubleshoot on the host level.