Friday, July 28, 2017

The Setup

The DBAs where I work were having a discussion this week about how we should setup Quorum on clusters that have different sets of nodes. We looked at our current setups and noticed that there are some inconsistencies. Some have a File Share Witness and some do not, some times the nodes in the BDC (DR) have votes and some times they do not. We would like to have a standard configuration for all clusters. It makes it easier to setup and easier to validate if something is not correct.

In our environment we have three basic types of SQL clusters.

Two node cluster - One node in the PDC and one node in the BDC

Three node cluster - Two nodes in the PDC and one node in the BDC

Four node cluster - Two nodes in the PDC and two nodes in the BDC

We had some concerns over what happens when you use a file share witness in a two node cluster configuration so I created a cluster in our lab and tested out some scenarios. In Windows Server 2012 R2 Microsoft introduced the concepts of Dynamic Quroum and Dynamic Witness. I am not going to rehash what can be found using google but here are a couple of articles I used when running this test:

The Testing

I setup a cluster named ASTCL-DUO with two members, Batman and Robin. I added a File Share Witness and gave both nodes a vote

Then I checked the cluster configuration using PowerShell

You can see the quorum dynamically configured the witness to have 1 vote (WitnessDynamicWeight) so that we have an odd number of votes.

Remove One NodeNext I removed one of the nodes from the cluster to see what would happen to the assigned votes and the Witness. I did this by shutting down the cluster Service on Robin.

I was very surprised what I saw when I checked the cluster quorum settings

Not only did the File Share Witness not get removed but the Node Weight for the cluster node that was shut down stayed at 1. It still has a vote assigned. Dynamic Quorum did nothing. I thought about this and it makes sense. You had three votes, you only need to have two for a quorum so there is no need to change anything. My only concern with this is what will happen if the File Share Witness goes down. I will test that later and add the results below

Remove One Node and File Share Witness

Now we are going to leave the Robin node down and shut down the File Share Witness as well. Since I am using my DC as the file share witness I am just going to turn off sharing for the folder that is being used as the file share witness

Our node status doesn't change but now we can see that the File Share Witness has failed

Using our PowerShell Script we can see that nothing has changed with the Quorum

The kicker is that since there are three voters in the quorum but we only have one available the cluster shuts down

I would have thought that dynamic quorum would have removed the votes from secondary node and the witness when the secondary node went down to prevent this from happening.

Two Node Cluster with File Share Witness and One Voting Node

Initial Setup

For this series of tests I started the cluster service back up on Robin and removed it's vote to see what would happen to the cluster when we started removing things

I ran the cluster quorum configuration script

Since we started out with an even number of votes (1), the cluster decided it did not need the File Share Witness so it set the WitnessDynamicWeight to 1

Remove One Node

I shut down the cluster service on Robin again.

The Quorum configuration will be the same since the node didn't have a vote

Remove One Node and File Share Witness

Leaving the Robin node down, I next turned off sharing for the File Share Witness folder.

The Cluster Quorum configuration will be the same

The interesting thing here is that the cluster does not go down like it did in the first setup. I let it sit in this state for around 5 minutes to be sure.

Conclusions

If you read through any cluster documentation found on the web regarding using a File Share Witness (see my two links above), it seems to be best practice to use a witness even if you have an even number of nodes. If you try to configure your cluster Quorum without one you even get an error message about it during setup.

Since we currently are not running an active/active setup and DR failover is a manual event, we also do not need to have votes on the nodes in the BDC.

Going forward (and possibly backward, we might need to fix some current clusters). It seems like the best setup is to

Always have a File Share Witness

Always remove the votes from DR (and AG read servers)

This will prevent the cluster from coming down as long as the Primary node is available, it can survive the loss of both the secondary node and the File Share Witness.
This might change if we move to an Active/Active setup or the FSW goes to a third party site in the future but with our current setup I think this is best