Windows Server 2008 Failover Clustering

Executive Summary: Microsoft Windows Server 2008’s failover clustering provides increased hardware support, a single quorum model that can operate in one of four modes, and a simplified cluster creation process.

One of Windows Server 2008's advantages is high availability, including failover clustering. Specifically, Server 2008's failover clustering provides increased hardware support, a single quorum model that can operate in one of four modes, and a simplified cluster creation process. (For a basic overview of failover clustering, see the Learning Path.)

Other changes in Server 2008 that improve failover clustering include the configuration flexibility to cater for higher latencies between locations than is typically acceptable for cluster heartbeats, as well as multi-site clustering, which allows a cluster to have IP addresses on multiple subnets and therefore nodes in multiple sites. In addition, Server 2008 failover clustering provides greater scalability because it allows 16-node clusters on a 64-bit architecture, compared with the 8-node maximum in Server 2003 (and Server 2008 32-bit). I'll touch on some of these enhancements as I discuss the main improvements in Server 2008 failover clustering.

Hardware Support

Windows Server 2003 has a dedicated cluster-certified hardware list. For Server 2008, you must purchase logo-certified components, then run a cluster validation process to ensure that your configuration is supported. If your configuration passes the validation test, then your cluster is supported by Microsoft. You can also run the validation process on a configured cluster to check for problems that might be causing the cluster to fail.

The cluster validation process isn't a new tool—it was formerly called ClusPrep and was downloadable from Microsoft.ClusPrep was enhanced in version 2.0, which is included in Server 2008. Four types of tests are included in the cluster validation process: Inventory, Network, Storage, and System Configuration.

Network. The Network test checks the cluster network configuration. This check verifies the IP and subnet information on adapters, as well as validates the IP configuration to ensure that addresses are unique, multiple adapters aren't connected to the same subnet, a default gateway is configured on one adapter, and no duplicate MAC addresses exist (which is important if you're using virtual machines—VMs). Network communication between the nodes is checked, and the firewall rules are verified to ensure that cluster communication won't be interrupted.

Storage. The Storage test lists all the disks, particularly shared disks, that are visible to all nodes. Shared disks are then tested for failover suitability, including data being kept intact.

System Configuration. The System Configuration test verifies that Active Directory (AD) is configured correctly and that all the nodes are in the same domain and ideally the same organizational unit (OU), which is important to ensure consistent Group Policy application. Note that the same OU isn't mandatory and will generate only a warning. Nor do nodes need to be in the same subnet or AD site. The System Configuration test also verifies that all the drivers are signed and that the same OS version, service packs, and software updates are installed. Required clustering services (e.g., RpcSs, Remote Registry, LanmanServer, WinMgmt) are checked to ensure they're running. Finally, all the nodes are checked to ensure that they're running the same architecture, which is necessary in a cluster.

The cluster validation process is initiated from the Microsoft Management Console (MMC) Failover Cluster Management snap-in. Select the Validate a Configuration action, enter the names of the nodes that will be part of the cluster, and select the tests to run. Figure 1 shows the cluster validation tests running. When the tests are complete, a summary display shows the status of each test and highlights areas requiring attention or components that aren't suitable for the cluster. For a more detailed report, click the View Report button or go to the the \%windir%\Cluster\Reports folder.

To prevent the problem of buying and installing components that aren't supported, Microsoft offers the Failover Cluster Configuration Program (FCCP). To find cluster configurations that have been previously validated by Microsoft or its partners, go to Microsoft's Windows Server Cluster Solutions website.

Other fundamental changes in the hardware that Server 2008 clusters support involve SCSI and iSCSI. The good news is that the SCSI bus resets that plagued us in Server 2003 are gone; however, Server 2008 requires persistent reservations, which means that supported storage options are now different. Parallel SCSI, with its cable length limitations and cross-over challenges, is no longer supported; instead, you must use Fibre Channel, SAS, or iSCSI. If you use iSCSI, you should use a separate network for connectivity to the iSCSI SAN to avoid bandwidth conflicts between storage access and other client/cluster communications. You also need to ensure that your storage hardware supports SCSI-3, particularly the SPC-3 iteration that documents persistent reservations.

Quorum Model

The quorum is essential in failover clustering to ensure that a consistent view of the cluster is maintained and in case the cluster nodes become partitioned (i.e., the cluster splits into multiple groups of nodes that aren't communicating with each other). A quorum ensures that only one partition of the cluster can offer a service. Corruption is likely to occur if more than one of a cluster's partitions offers the same service.

Server 2008 has only one quorum model, compared with Server 2003, which had multiple quorum models. Server 2008's Majority Quorum Model can operate in one of four modes depending on how you allocate the available votes. The four modes include Node Majority, Node and Disk Majority, Node and File Share Majority, and No Majority: Disk Only.

The following Server 2008 cluster components can have a vote:

Cluster nodes

Shared disks (also known as a disk witness)

File share witnesses, which are simply file shares on a Server 2008 or Server 2003 server that aren't part of the cluster

A cluster can't have both a disk witness and a file share witness; they are mutually exclusive. As I already mentioned, how you allocate the available votes influences the mode the quorum uses. Each quorum mode supports different combinations of node and disk or file share witness combinations to make the quorum. As Figure 2 shows, a cluster partition must have more than half the number of votes for the partition to make quorum.

Node Majority. The Node Majority mode assigns votes only to the cluster nodes. This means that if the cluster becomes partitioned, the partition must have more than half the number of nodes in order to offer services. For example, if a five-node cluster splits, only the partition with three nodes can make quorum and therefore offer services. Node Majority works best with and is recommended for an odd number of nodes. With an even number of nodes, such as four nodes, three nodes must be available to make quorum. In this type of situation, if two nodes are in one location and the other two nodes are in another location, and if the locations become unable to communicate for some reason, neither location has enough nodes to make quorum and the cluster can't offer any services. With an odd number of nodes, such as five nodes, one location would have three nodes and the other location would have two nodes. The location with three nodes could make quorum and continue to offer services. Scenarios with an even number of nodes are better suited for disk or file share witness modes.

Node and Disk Majority. In the Node and Disk Majority mode, each node has a vote, as well as a shared disk called the disk witness. This mode is preferable in situations with an even number of nodes with shared storage available. Suppose you have four nodes that become equally partitioned. Only one of the partitions can own the disk witness, which gives that partition an additional vote. The partition that owns the disk witness can therefore make quorum and offer services.

The disk witness should be a basic disk with a single volume at least 512MB in size and formatted with NTFS. It should be a dedicated LUN and doesn't require a drive letter. You shouldn't perform antivirus scanning on the disk witness or its data.

Node and File Share Majority. The Node and File Share Majority mode works in exactly the same way as Node and Disk Majority except the disk witness is replaced with a file share. This mode is recommended if you have an even number of nodes and you don't have shared storage available.

The file share must be on a Server 2008 or Server 2003 file server that isn't part of the cluster that the file share is acting as file share witness for (it could be hosted on another cluster). In addition, the file share must be hosted on a server that is part of the same Active Directory (AD) forest as the cluster. The share should be dedicated for file share witness duties only. If the cluster is a multi-site cluster, then the file share should ideally be on a separate site from the nodes in the cluster to provide additional resilience from a site failure. Finally, the file share shouldn't be part of a DFS namespace.

No Majority: Disk Only. In the No Majority: Disk Only mode, only a shared disk (i.e., the disk witness) has a vote. The nodes are like Europeans on a green card: No vote at all. The cluster makes quorum as long as the disk witness is available. If the cluster becomes partitioned, the partition that owns the disk witness makes quorum. This mode's obvious weakness is that the disk witness is a single point of failure. If the disk witness is unavailable, the cluster can't make quorum or offer any services. Table 1 summarizes when to use each of the four modes. Note that the Disk Only mode isn't recommended.

Cluster Creation

To create a cluster in Server 2008, start the MMC Failover Cluster Management snap-in, select Create a Cluster, enter a cluster name, and tell the wizard which nodes will be part of the cluster. The wizard then scans your environment and selects the optimal quorum mode and resources to use. A change in the cluster creation process from Server 2003 is that you only have to run the cluster creation process once, and the wizard configures all the nodes in the cluster.

Server 2008 clustering fully supports IPv6 and DHCP; you're only prompted for an IP address to use for the cluster if the network adapters are currently configured to use static IP. If static IP addresses are used on the cluster nodes, you must provide an IP address, as Figure 3 shows. The cluster creation process configures all the nodes in the cluster and, based on the shared storage available and configuration of the nodes, automatically selects the best quorum and configuration options. If a disk witness is required, the smallest volume that is larger than 512MB is selected as the disk witness. Any other shared storage is placed in an available storage area for use by cluster resources (which is a change from Server 2003 clustering).

After the cluster is created, the MMC displays a summary that shows the nodes in the cluster, the quorum mode, and which disk is the disk witness (if used). The wizard also writes a report named CreateCluster.mhtml to the \%windir%\Cluster\Reports folder. As in the cluster validation process, you can click the View Report button to see more details about the cluster creation process. Note that you don't have to add all the necessary nodes when you create a cluster; you can add nodes later.

Keep in mind that the cluster creation wizard can only do so much. For example, if you have a configuration with an even number of nodes and no shared storage, the wizard can't automatically select a file share and configure the Node and File Share Majority mode. Instead, the Microsoft Management Console (MMC) Failover Cluster Management snap-in will generate a warning in the quorum area, telling you that the quorum configuration isn't optimal and that you need to change it. Fortunately, the quorum change process is simple. To configure a file share witness, you just specify a file share to use that the administrator has full control over; the quorum wizard takes over from there.

Even migrating from Server 2003 to Server 2008 poses problems. You can't perform a rolling upgrade, because you can't have a mixture of Server 2008 and Server 2003 nodes within a cluster. Instead, you must create a new cluster with Server 2008 nodes, then use the Migration wizard in Server 2008's MMC Failover Cluster Management snap-in to migrate resources from Server 2003.

If you don't have the extra hardware to create a new Server 2008 cluster, you'll have to remove servers from your existing Server 2003 cluster, rebuild the servers with Server 2008, create a new cluster, and migrate your resources. You can then rebuild the remaining Server 2003 nodes with Server 2008 and add them to the Server 2008 cluster. The drawback of this approach is that while you're transitioning your resources, each of your clusters will be running with a reduced number of nodes, possibly leading to decreased performance and a higher risk of failure.

Improved Clustering

Failover clustering is easier to plan for, create, and manage in Server 2008 than in Server 2003. Server 2008's failover clustering offers more hardware options for creating clusters, as well as multiple quorum modes for more flexible configuration. Even organizations without dedicated clustering teams can take advantage of Server 2008's high-availability improvements. In addition, Server 2008's multi-site capabilities make clustering an attractive disaster recovery solution. For more information about creating and validating Server 2008 failover clusters, including some how-to videos, go to the SavillTech video website.