Overview of Configuring the Head Node for Failover with Windows HPC Server 2008 R2

Updated: March 17, 2011

Applies To: Windows HPC Server 2008 R2

This guide provides procedures and guidance for deploying Windows HPC Server 2008 R2 in a failover cluster where the servers are running Windows Server 2008 R2. The guide describes how you can configure the head node in a failover cluster. This topic provides an overview of the configuration. For a detailed list of requirements for the configuration, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

Overview

In an HPC cluster, if you want to provide high availability for the head node, you can configure it in a failover cluster. The failover cluster contains servers that work together, so if one server in the failover cluster fails, another server in the cluster automatically begins providing service (in a process known as failover).

Important

The word “cluster” can refer to a head node with compute nodes running software in Windows HPC Server 2008 R2, or to a set of servers running Windows Server 2008 R2 that are using the failover cluster feature. The word “node” can refer to a head node, compute node, or WCF broker node running software in Windows HPC Server 2008 R2, or to one of the servers in a failover cluster. In this document, servers in the context of a failover cluster are usually referred to as “servers,” to distinguish failover cluster nodes from an HPC cluster head node or compute node. Also, the word “cluster” is placed in an appropriate phrase (such as “failover cluster”) or used in context in a sentence to distinguish which type of cluster is being referred to.

Each of the servers in a failover cluster must have access to the failover cluster storage. Figure 1 shows the failover of head node services that can run on either of two servers in a failover cluster:

Figure 1 Failover of head node services in HPC cluster

To support the head node, you must also configure a SQL Server, either as a SQL Server failover cluster (for higher availability) or as a standalone SQL Server. Figure 2 shows a configuration that includes a failover cluster that runs the head node and a failover cluster that runs SQL Server.

Figure 2 Failover clusters supporting the head node and SQL Server

In the preceding figure (Figure 2), the failover cluster storage for the head node includes one disk (LUN) for a clustered file server and one disk as a disk witness. The disk witness is necessary for any failover cluster that has an even number of nodes (the head node failover cluster has two).

When both the head node and SQL Server are in failover clusters, multiple failover clusters are required. Figure 3 illustrates that when you configure multiple failover clusters, you must limit the exposure of each storage volume or logical unit number (LUN) to the nodes in one failover cluster:

Figure 3 Two failover clusters, each with its own LUNs

Note that for the maximum availability of any server, it is important to follow best practices for server management—for example, carefully managing the physical environment of the servers, testing software changes before fully implementing them, and carefully keeping track of software updates and configuration changes on all servers in a failover cluster.

When the head node is configured in a failover cluster, for the network topology, we recommend either Topology 2 or Topology 4 (the topology shown in Figures 1 and 2). In these topologies, there is an enterprise network and at least one other network. Using multiple networks in this way helps avoid single points of failure. For more information about network topologies, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

Services and resources during failover of the head node

This section summarizes some of the differences between running the head node for Windows HPC Server 2008 R2 on a single server and running it in a failover cluster.

Important

In a failover cluster, the head node cannot also be a compute node or WCF broker node. These options are disabled when the head node is configured in a failover cluster.

For connections to a head node that is configured in the context of a failover cluster, do not use the name of a physical server. Use the name that appears in Failover Cluster Manager. To see the name in Failover Cluster Manager, in the appropriate failover cluster, expand Services and applications, select the clustered instance of the head node, and in the center pane, view the name under Server Name. After the head node is configured in a failover cluster, it is not tied to a single physical server, and it does not have the name of a physical server.

The following table summarizes what happens to each service or resource during failover of the head node:

Service or Resource

What Happens in a Failover Cluster

HPC SDM Store Service

HPC Job Scheduler Service

HPC Session Service

HPC Diagnostics Service

Fail over to the other server in the failover cluster.

Four file shares that are used by the head node

Ownership fails over to the other server in the failover cluster.

DHCP

HPC Management Service

HPC MPI Service

HPC Node Manager Service

HPC Reporting Service

NAT

WDS

Start automatically and run on each individual server. The failover cluster does not monitor these services for failure.

File sharing for compute nodes

Fails over to the other server in the failover cluster if configured through the Failover Cluster Manager snap-in.

Note

The HPC Basic Profile Web Service and the HPC Storage Management Surrogate service are also installed on a head node (whether that head node is in a failover cluster or not). However, these services are not activated by default. For information about uses and requirements for the HPC Basic Profile Web Service, see HPC Server Basic Profile Web Service Operations Guide (http://go.microsoft.com/fwlink/?LinkId=198311).