Quorum Arbitration in a Geographically Dispersed Cluster

Thanks to Oliver Simpson for his comments and his question…how does quorum arbitration work in a geographically dispersed cluster and how is this different between SRDF/CE and MirrorView. This is a great question and I hope I don’t bore you with too many details here.

There are many storage vendors out there and I’m sure that they all have their own method of performing box to box replication and controlling access to these mirrors. With EMC SRDF, the source device is labeled an R1 device and the remote device is considered an R2 device. SRDF pairs can be manipulated by users with the Solutions Enabler software to make either (or both) mirrors read/write enabled or write disabled. With MirrorView, it labels one mirror the Primary mirror and the remote is considered the secondary mirror. MirrorView volumes can be manipulated using the Navicli software to promote or fracture mirrors controlling the read/write access to the volumes. So far, there we’re pretty much equal with their capabilities.

In a geographically dispersed cluster, we’re going to have one node of the cluster connecting to the R1 or primary mirror, while the other node attempts to access the R2 or secondary mirror. Typically, the R2/secondary mirror is either write disabled or not ready to the host while the cluster owns the resources on the R1/primary mirror. When the cluster needs to fail over to the remote site, you will need to issue commands to make the R2/secondary mirror read/write enabled before MSCS attempts to bring the disk resource online, and also make the R1 mirror write disabled/not ready.

For all other Physical Disk resources in the cluster (other than the quorum), we can use cluster resource dependencies so that the disks do not attempt to come online until the storage has been read/write enabled to the host. With SRDF/CE, we’ve created our own resource type to control these actions for all non-quorum resources and as I’ve described in a previous blog entry, you could create a generic application or script resource to control this behavior with MirrorView using Navicli commands.

The big problem comes in when we need to deal with a shared quorum disk in Microsoft cluster. The quorum disk CANNOT be made to depend on any other resources. Microsoft has done this so customers cannot shoot themselves in the foot by accidentally making their quorum disk depend on a faulty or mis-configured resource. When you attempt to make the quorum depend on a resource, you will receive an error message:

Additionally, if you do find a way to overcome this issue, you will next need to contend with the issue of the dreaded “split brain” syndrome. What happens if you have a network failure between sites and MSCS attempts to arbitrate for the quorum disk in a geographically dispersed cluster? You are accessing the quorum disk over two separate SCSI busses using two separate physical disk drives, so when MSCS issues a SCSI bus/target reset against the local quorum disk, this has no affect on the remote device so we’ve basically broken the quorum arbitration process. If some sort of mechanism is not in place, MSCS would be able to bring both copies online and this will likely cause a split brain to occur.

Because of these challenges, it is difficult to find a work around and have a shared quorum disk in a geographically dispersed cluster. Microsoft has given customers a way around this by introducing the MNS quorum, which is certainly a doable option…though it does have its limitations. See my previous entry for more information on this topic.

With SRDF/CE, we accomplish this by adding a secondary arbitration process whenever a standard Microsoft quorum arbitration event occurs. SRDF/CE adds a filter driver into the stack to detect quorum arbitration events. When an arbitration event occurs, the SRDF/CE filter driver halts the standard quorum arbitration process and completes its own arbitration before allowing the standard quorum arbitration event to continue.

The SRDF/CE arbitration process uses Solutions Enabler APIs to acquire Symmetrix locks to simulate a persistent reservation across the SRDF link. We release/acquire these locks during quorum arbitration in a fashion similar to Microsoft’s challenge/defense arbitration process. When a bus reset occurs from the challenger, we release locks and wait to see if the defender will re-acquire its locks and then fail if the node successfully defends. Once a node has successfully acquired these Symmetrix locks, we issue the appropriate Solutions Enabler commands to make the quorum disk resource read/write enabled to the host. Once the quorum is read/write enabled, the cluster quorum arbitration process continues and MSCS decides whether or not to allow the quorum to come online on the node.

In the Clariion environment, the CX arrays do not have these locking mechanisms and the API is not as robust so we do not have the same capabilities. Therefore, using a MNS quorum is the only available option in the MirrorView environment.

I hope this helps to give some insight about the way that SRDF/CE handles quorum arbitration. Feel free to leave a comment if anything is unclear.

3 thoughts on “Quorum Arbitration in a Geographically Dispersed Cluster”

Thanks very much for taking the time to post this. I’ve found it very useful and I now understand the quorum arbitration problem on MirrorView.

I am looking into using software mirroring (configured on the hosts) as a possible alternative to MirrorView. This would also be a MNS Cluster.

Roughly speaking, this will involve using a product such as Veritas Storage Foundations (Volume Manager) to set up mirrored volumes (except for the Quorum) using disks from Both SANs.

If one site is lost then half of the mirror is preserved and operation can continue. This assumes both sites are active and all LUNs in both sites are presented to all hosts in both sites.

Obviously, this adds workload for the hosts (and presumably more traffic over the VSAN) but removes the need for MirrorView and associated failover scripts.

Given that the volumes of data involved in my specific case would be relatively small, Would you say that avoiding the extra complexity of using Mirrorview is worth the overhead inherant in a solution like this?

VSF mirroring means you would need to present both the source and the target volumes to all nodes in the cluster in order to form the mirror. Sounds like a pretty expensive use of your bandwidth. If you wanted to go this route, I would consider using something like NSI’s double-take instead as IP bandwidth is generally less expensive than SAN bandwidth.

Drop me an email at jtoner@mvps.org if you want to discuss this one further.