Troubleshooting vSphere Fault Tolerance in Network Partitions

<

When a vSphere HA cluster experiences a failure of the network that vSphere uses for inter-agent communication (the management network), a subset of the cluster's hosts might be unable to communicate with other cluster hosts. In this case, the set of hosts that can communicate with each other are considered to be in a network partition.

A cluster partition impedes cluster management functions such as vMotion and can impact vSphere HA’s ability to monitor and restart virtual machines after a failure. This condition must be corrected as soon as possible.

Network partitions also degrade the functionality of vSphere Fault Tolerance. For example, in a partitioned cluster, a Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a Secondary VM must be restarted, vSphere HA does so only if the Primary VM is in a partition managed by the master host responsible for it. Ultimately, you must correct the network partition, but until that is possible, you must troubleshoot and correct any problems that arise with your fault-tolerant virtual machines to ensure that they are properly protected.