vSphere 5 High Availability

VMware High Availability (HA or codename Fault Domain Manager during its development) feature has been completely rewritten for vSphere 5. In previous versions of vSphere technology of Legato called AAM was used for the clustering of ESX(i) hosts. This was not always simple to configure, had lots of different logfiles and was not always reliable.

vSphere 5 has no primary nodes anymore. One of the hosts in the cluster is the master node, all other nodes are slave modes. When the host having the master role fails immediately a new master is selected by the remaining slave nodes. One of the great things about not having 5 primary nodes is that you do not have to worry about which blades in a chassis are having the primary node role. Untill vSphere 5 HA could possibly not work if all 5 nodes are running on blades in the same failed blade enclosure.

A HA cluster can be deployed much quicker than previously. In vSphere 5 in 20 seconds the cluster is configured no matter how many nodes are in the cluster. All hosts are provisioned/deployed with an agent in parallel.

The node having the master role is responsible for monitoring the other nodes and virtual machines if they are alive. The master will communicate with vCenter Server about the status of the cluster. The election of the master is done over UDP but all other commnication is done over SSL. Selection of the master cannot be done by an administrator. It is done automatically. One of the factors taken in account in the election of the master is the number of datastores the master has access to. But redundant nics or other items are not considered in the election process.

Also dependencies on infrastructure components like DNS have been removed. For example when DNS is run in a VM you do not want HA to rely on DNS when the VM is down because of a host failure. This prevents to catch 22 situations. IP addresses are used to communicate with the ESXi hosts.

HA uses two channels for agent to agent communications (heartbeat). The management network is used as well as storage. Datastores used for heartbeat are called Heartbeat datastores. Each host will have a single logfile for HA errors and status messages. These logfiles can be sent to a central syslog server. vCenter Center 5 can be used as a syslog server.

HA has been designed for higher scalability although the current release will support the same number of hosts in a cluster like vSphere 4 did.

New features are:
–management network partition (new). This is for clusters that span multiple subnets. Like a 8 node cluster where 4 nodes are in subnet A and 4 are in subnet B. If the link between the two subnets is down, 4 nodes are isolated.
–single HA log file per host and syslog integration (new)
–host isolation response (improved) . This is done by adding a second channel for agent commnication over the storage layer.
–admission control (improved)
–agent error reporting (improved). the message will be much more detailled mentioning by an agent does not start
–more alarms and events (Improved)

The user interface has changed showing which node is master, which datastores are used for storage channel. It also shows if a VM is protected or not.

Admission control has besides the memory and CPU reservations a new item called Failover hosts. These are host not part of DRS so no VM’s are moved to these specified hosts. It does not run VM’s either. Those host are just as a reserve.

In vSphere 4 HA could monitor failures of the ESX(I) host and VM’s running on top of it. If a virtual machine failed (blue screen of death in Windows) the VM could be restarted by VMware HA. However, a failed application could not be detected. This has changed in vSphere 5. VMware made during the development available an application available API. A script or programm running in the guest os could detect an application failure and signal this to the API. This could result in a restart of the VM so that the application is operational again after the restart. Symantec and Neverfail were the only two companies who got access to the API during the beta. Symantec delivers a product called Application HA

At the release of vSphere 5 the application aware API will be made available for public.

If you want to know more about clustering buy the book by Duncan Epping and Frank Denneman which was released at July 12. It is available for the Kindle platform and also in colour or black/white in print.