Testing SQL Server 2000 Clusters

While the first two tests were performed from the Cluster Administrator, the next three tests are more real world. In this test, you will first need to ensure that all of the default groups are located on the primary node. Then you will physically turn off (flip the switch) the primary node.

If you are watching the cluster groups from the Cluster Administrator from the secondary node after turning off the primary node, you should see a failover occur and all the resources should be automatically failed over to the secondary node. Check the Event Log for any potential error messages after this occurs.

Once you have checked for any potential problems, turn on the primary node and wait until it fully boots. You will note that turning on the primary node does not cause the cluster to fail back. The cluster resources will remain on the secondary node until you force them to return to the primary node.

Now turn off the secondary node, repeating what you did earlier with the primary node. As before, you can use the Cluster Administrator from the primary node to watch the groups fail over to the primary node. Check the Event Log for any potential error messages.

Once the groups fail back to the primary node, turn the secondary node back on, and wait until it boots up fully.

This is a very good test to see if failover will work in the real world. If no problems arose from this test, then you are ready for the next test.

Test Number 4: Break Network Connectivity

This test is similar in concept as the above test. What we want to do is force a fail over. But instead of simulating a computer failure, we will be simulating a network-related error.

From the primary node, remove the network cable from the public network card. This will simulate a failure of the primary node, and should initiate a failover to the secondary node.

If you are watching the cluster groups from the Cluster Administrator from the secondary node, you should see a failover occur and the resources should be automatically failed over. Check the Event Log for any potential error messages.

Once you have checked for any potential problems, plug the network cable back into the primary node, and then remove the network cable from the public network card on the secondary node. As before, you can use the Cluster Administrator to watch the groups fail over to the primary node. Check the Event Log for any potential error messages. Once you are done, plug the network cable back into the public network card on the secondary node.

If no problems arose from this test, then you are ready for the next.

Test Number 5: Break Shared Array Connectivity

This test is designed to help uncover potential issues with the shared disk array. I have seen clusters pass all of the above four tests, but fail this one if the shared disk array is not configured 100% correct. This test is designed to simulate what would happen if the controller card or cable connected from a node to the shared disk array fails.

From the primary node, remove the cable from the card used to connect to the shared array. This will simulate a failure of the primary node, and should initiate a failover to the secondary node.

If you are watching the cluster groups from the Cluster Administrator from the secondary node, you should see a failover occur and the resources should be automatically failed over. Check the Event Log for any potential error messages.

Once you have checked for any potential problems, plug the cable back into the primary node, and then remove the cable from the card used to connect to the shared array on the secondary node. As before, you can use the Cluster Administrator to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once you are done, plug the cable back into the appropriate card.

Once you have successfully completed your testing, you can be fairly confident that your SQL Server 2000 cluster can now be put into production.