Sunday, February 9, 2014

Windows Server Failover Clustering Quorum Behavior Guide

“A Republic Quorum, if you can keep it.” - Ben Franklin.

Your WSFC Quorum is like a Republic, or more accurately, a Democracy. There are many articles out there regarding Quorum voting logic but most are somewhat lengthy. I decided to set out to see how few words I could effectively explain Quorum rules in, so here we go. Don't count this part. Or this. Wait... er... start counting.... NOW.

Windows 2008 and higher:

A majority is defined by greater than 50% consensus. (a tie is not majority)

Fractions (only) are rounded up to the nearest integer. (2.5=3)

There is a legacy quorum method called "disk only" wherein one (defined quorum) disk is the only vote. This is considered obsolete because it creates a single point of failure.

Windows 2008/r2 with Hotfix 2494036 or Higher:

"Nodeweight" was added to revoke a node of its voting privileges (NodeWeight=0). You can use this for nodes in a different site or to ensure that shared disk/fileshare casts the deciding vote. This is generally used in cross-site clusters.

Windows 2012 or Higher:

Dynamic Clustering

"Dynamic Clustering" changes the nodeweight of downed cluster member and effectively reduces the number of participating nodes by one. This works under the following circumstances:

Prior to the outage, the cluster has achieved quorum (normal under most circumstances)

Nodes must go down one at a time so the remaining nodes can agree to removed the downed member. If multiple nodes go down simultaneously the dynamic removal will not take place.

Examples/Illustration:

Closing

Windows Server Failover Clustering is an excellent option for SQL Server, Hyper-V, and other services. Hopefully this understanding of cluster failover behavior enables you to design solutions that better meet the needs of your clients.

Note: It is important to consider how a Quorum is formed when considering patching strategy.