We have recently upgraded ESXi on all our servers in the cluster from ESXi 6.5 to ESXi 6.7 Update1 (EP7 build) and since then we are seeing issues on multiple VMs (linux vms) that are randomly getting restarted. These reboots were seen since the upgrade and also during vMotion of vms to other hosts during the upgrade. Even today, 4 days after the upgrade, some VMs are rebooting with the below error.

After applying the drivers and upgrading firmware on the hosts, we are still seeing the VMs being sporadically restarted by HA. The case has been escalated to a P1 with GSS and after a couple of phone calls with Gas they have confirmed that there are 5 other customers reporting the same issue with other hardware vendors. We have other clusters in the same environment that is not impacted and other customers who are running vSphere 6.7 U1 and they are not impacted by this issue.

Gss are still working on root cause but it looks like it could be due to an issue when VMs are migrated from ESXi 6.5 to ESXi 6.7.

An update on this issue - Engineering have an ESXi patch that provides additional debugging that they'd like to apply. Of the 6 SRs for this issue, one customer has applied the patch above and 4 have reverted their environments to 6.5.

The restart issue is occurring only once on every VM. So far we've had 90+ VMs restart but not one VM has been restarted more than once.

VMware engineering have confirmed that the Crash / backtrace for all the issues reported are same with the memory fault and it has nothing to do with the Hardware version.

VMware GSS have also provided a patch for ESXi which we will rolling out to the cluster tonight. If the pattern continues then VMs will continue to get reset which will allow us to get the additional information for engineering to further look into the issue.

The patch provided by VMware Gss has been applied to the cluster and after applying the patches we are still seeing VMs being reset by HA. Logs for the corresponding VMs and hosts have been uploaded to the engineering team and they are currently reviewing them.

We hope to hear back from the soon and I will post some updates as soon as I hear back.

Just an update on this case - The customer had requested us to roll back half the cluster to 6.5 as they could not tolerate further VMs crashing.

VMware GSS provided us with a patch for 6.7 (debug patch) which has further logging capabilities that they need to investigate the issue further however all our attempts to install this one host failed and the build number on the host wasn't changing. After multiple phone calls with GSS to resolve this, VMware engineering have now supplied us another image which has actually worked and the build number is now updated on the host. Unfortunately its been so long and the customer has not experienced any VM HA reset events in the last 2 weeks now.

We are sill planning to roll this out to all 6.7 hosts this week and if we encounter the issue again we will upload logs to GSS.