Veritas Global Cluster Solution VS ESXi VMware with SRM

To my knowledge two ESXi hosts with a SAN attached can give local failover of VM's from one ESXi host to another ESXi host. If we add VMware SRM(Site Recovery Manager) product between these two ESXi hosts(PRIMARY) and one ESXi host (DR) , we can establish a Global Cluster(In Symantec term) and through above solution all VM's which exist at PRIMARY site can failover to DR site if PRIMARY site goes down.

For establishing this environment we need the licenses of three ESXi hosts, two Vcenters and one SRM.

This solution also has a feature like fire drill in SFHA.

This solution provide failover on ESXi host failure, OS failure or any kind of hardware component failure. It does not have a super feature which can detect application failure. For this Symantec has a Application HA. The Application HA integrates with the above solution and provide failover on application failure as well.

So to my knowledge in the above virtulization scenerio we can only eligible to propose Application HA. As the above solution can provide best response for Hardware/OS failure.

Correct me if I am wrong for any point mentioned above. Suggestion required if we can improve the above solution with the help of Symantec product(s).

Another option is to install VCS in the virtual nodes. The advantage of this is you can fail an application from one virtual node to another virtual node that is already running, rathering than starting up the same virtual node like with ApplicationHA. Let me give an example (I know mainly about VCS, so some details about ApplicationHA and ESX may not be 100% correct):

You have applications A and B running in virtual node V1 and another virtual node V2

In ApplicationHA you would have A and B running in V1 and V2 would not be running and if app A dies, then I believe V1 is shutdown and started somewhere else (or with SRM maybe you can also startup V2 which runs apps A and B too) and hence this results in an outage for app B which didn't have an issue and also as well as starting app A and B, you have to start V1 (or V2)

In VCS you would have A and B running in V1 and V2 WOULD be running and if app A dies, then app A is started in V2 and app B remains in V1 and the failover is usually quicker than using ApplicationHA as V2 is already running, so you don't have to start V2. With this solution, VCS controls failover, not VMWare, but you can still vMotion virtual nodes and as well as virtual to virtual failover, you can also configure virtual to physical and physical to virtual

As you said : Another option is to install VCS in the virtual nodes. The advantage of this is you can fail an application from one virtual node to another virtual node that is already running

This will really increase cost if we install Global Cluster in the virtual nodes. So its up to the client that does he bear the downtime while virtual machine will be up via Application HA or he cant bear the downtime of our services.

I think the time saving is quite small - it is more that VCS is more flexible. Suppose you have 4 SQL instances. With ApplicationHA, if you put them all in the same VM then if one SQL instance fails, then all the SQL instances will be failed over, so probably you might use 4 VMs instead and install each SQL instance in its own VM, but the you have 4 O/S's. With VCS as you move the individual SQL instance as you don't have to more the whole VM so this gives you more flexiblity - for instance if the physical node dies so you loose all 4 SQL instances, you could fail the SQL instances to different VMs if you wanted to, if for examples the VMs you were failing to were already running apps so you wanted to share the load.

With the flexibilty that VCS brings if you apps run just as well at the DR site as the primary site, then I recommend customers spread their apps across nodes at both sites and that way you are making use of the hardware at the DR site all the time and when you have a disaster (at Primary site OR DR site) then you don't have an outage for ALL your apps as some will already be running on the site that remains up.

The link is correct, but the product for installing VCS in the virtual node will be VCS for Windows or Linux (depending on O/S in virtual host), not VCS for VMWare. There used to be a VCS for VMWare where you would install VCS in the VMWare hypervisor, but VMWare stopped supporting VCS in the hypervisor so this product is no longer available.

Up until 6.0.1 you could install VCS in a virtual host, but VCS would be unware it was being installed in a vitual host and as there was no communication with hypervisor you had to use raw disks (RDM) and so vMotion was not supported. Now VCS supports communication with vSphere, has a vCenter plugin and come with a "VMwareDisks" agent so you can use VMDK disks, so vMotion, Dynamic Resource Scheduler (DRS) and snapshots are supported.

@Zahid : if OS is corrupted, i dont think any help from VMWare exist to start it. So the solution for this lies in starting from earleir snapshot and nothing else. If the configuration present in OS does not change after snapshot is taken, it should be okey to use the previously stored vmdks for the OS

If the ESXi host is down, i think VMWare will take up the failover action. With in-guest VCS clustering your applications will failover instantly (*), and VMWare will bring up the VMs eventually. the application can be up and running sooner.

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

I think what Zahid was pointing out is that if O/S is corrupted, then as with VMWare, with or without ApplicationHA, the whole Virtual node is failed over - you loose your app, but with VCS as you just fail over the app to another node (whose O/S is not corrupted), your app can remain highly available.

Likewise, with patching the O/S, if using VCS inside the virtual node, you can patch the inactive node, then failover app to patched node, but with VMWare (with or without ApplicationHA), you have to take down app to patch the O/S.