The benefits of Windows 2003 clustering with Exchange 2003

You may have previously evaluated clustering and decided it wasn't for you. If so, this may be the time to take another look. Both Exchange and Windows have made dramatic improvements in clustering for the 2003 releases.

The most obvious improvement to Windows clustering is the ability to form a cluster with up to eight Windows 2003 Enterprise Edition servers. In Windows 2000 (except for Datacenter edition), you were limited to two computers per cluster.

Exchange has also fine-tuned and improved the way that Exchange works on a cluster, making it more manageable, recoverable and simple to deploy.

What is Windows clustering?

Servers that are members of a cluster share information between themselves about some of the applications installed on the servers. These applications are said to be "under cluster control."

The primary advantage of placing an application under cluster control is that if one computer running an application fails, the application can be "failed over" to another computer in the cluster. The application will run on the second computer the same as it did on the first computer. This is possible because each node in the cluster has access to the cluster database. The cluster database includes registry keys and other configuration information for the application. With this information, any node in the cluster can run the application just as well as any other node.

Most clusters have several shared disks that are accessible by any computer in the cluster. Only one computer in the cluster will "own" a particular disk at any given time. Typically, each shared disk is dedicated to a particular application. When responsibility for running an application is passed to a different cluster member or node, then control of the disk is passed along too. These disks must be on a shared channel connected to all computers in the cluster, and must present themselves as SCSI or Fibre channel disks to Windows.

NOTE: Some applications may not need to store persistent data, and so will not need a cluster-shared disk. But Exchange always needs shared disks to store databases and transaction log files.

How do I create a Windows cluster?

In Windows NT 4, it was pretty hard--you really had to want what clustering had to offer if you were going to take on the challenge. Cluster installation got much easier in Windows 2000. In Windows 2003, the cluster service is already installed and ready to go on every Windows server.

In Windows 2003, you can easily create a single node or "lone wolf" cluster if you wish to experiment with clustering. If you want to install applications that require cluster-controlled disks, then you must have additional SCSI or Fibre-channel disks attached to the computer. This "lone wolf" cluster will not be able to fail over to a different node, but you can even install and administer Exchange on it.

If you are creating a cluster for more than testing purposes, you should make sure that the hardware is all qualified as a cluster system on the Windows Hardware Compatibility List (HCL). You can view the HCL at:

Each application has different requirements. In most cases, an application will have a Setup program that is "cluster aware" and that properly configures the application. In Windows 2003, it is possible to manually configure an application for cluster control.

Generically, the end result of installing and configuring a clustered application is that one or more Resource Groups will be created. You can view Resource Groups in the Cluster Administrator console.

A Resource Group lists the services, disks and other resources needed by a single clustered application. If it becomes necessary to fail the application over to another node in the cluster, all resources in the group fail over simultaneously as a single unit, thus moving the entire application to a different server.

If a Resource Group contains a network name resource and an IP address resource, the group is also called a virtual server. A Resource Group that has its own network name and IP address can appear to external clients as a dedicated server for the application. The network name and IP address of the Resource Group can be registered in DNS, pinged and found on the network. The Resource Group/virtual server behaves to all clients outside the cluster as if it were an actual physical computer whose only job is to run the application.

Most applications configure their Resource Group(s) as virtual servers. By doing so, it doesn't matter which computer in the cluster happens to be "babysitting" the application--the virtual server always looks the same to clients no matter which node of the cluster owns the application at any given time.

How do I install Exchange on a cluster?

Setting up Exchange on a cluster is a two part process, best understood by contrasting it to setting up Exchange on a "standalone" server.

When you set up Exchange on a non-clustered computer, the Setup program installs Exchange program files and then does everything else necessary to leave you with a fully functioning Exchange server by the time Setup is done.

In contrast, when you set up Exchange on a clustered computer, the only thing that is done is installation of Exchange program files. After Setup exits, the cluster node is "Exchange-ready" but is not itself an independent Exchange server. If an Exchange virtual server (Resource Group) already exists in the cluster, this node can now "babysit" that virtual server if necessary.

After installation of Exchange program files on a cluster node, you finish the setup job by creating one or more Exchange virtual servers in Cluster Administrator. This is done by creating a Resource Group and adding an IP address, network name and shared disk resources (in that order) to the group. With those resources available, you are now ready to create an Exchange System Attendant resource (from a pick list). The creation of the System Attendant resource causes Exchange to generate all the other resources and Exchange services needed for the virtual server. You now have a fully functional Exchange virtual server that can store databases, serve clients and be failed over to other nodes in the cluster. In Exchange System Administrator, the new virtual server will look and behave just like any other Exchange server in the organization.

One thing to keep in mind about Exchange and clustering is that not all Exchange services are appropriate for installation on a cluster. Front end servers cannot be clustered (although network load balancing can be useful for them). Most connectors (such as the Lotus Notes Connector) cannot be installed on a cluster. Exchange clustering is primarily suited for back end mailbox and public folder servers.

I looked at clustering for Exchange 2000 and passed on it. Why should I reconsider now?

There are two primary reasons:

First, you can have up to eight nodes in a single cluster. In Exchange 2000, you could only join two nodes to a cluster.

Second, both Exchange and Windows have made very significant improvements in "fit and finish" for clustering. Clusters are not only more powerful but also considerably easier to configure, manage and understand than in previous editions.

There are several advantages of running Exchange in a clustered environment:

Rolling upgrades. Scheduling and installation of service packs and updates can be done with no visible downtime. All you have to do is move Exchange to a different node while you update each server.

Flexible server maintenance. Because you can move Exchange at will to a different server in the cluster, you can take nodes offline as needed or in an emergency with little or no impact on uptime.

Automatic failover. If a problem takes down one node, Exchange can be up and running before anyone has even had time to respond to the problem.

Simple disaster recovery. Even if a node in the cluster is completely destroyed while Exchange is running on it, Exchange configuration is safe in the cluster database. You can evict a lost node and join a new node to the cluster without affecting running Exchange virtual servers. The process of evicting and joining computers to a cluster is similar in nature and as easy as joining a computer to a Windows domain. After installing Exchange program files on the new node, it will ready to run Exchange virtual servers again without further configuration.

Flexible backup and restore strategies. Because nodes in the cluster can all share and move disks and applications between them, you can offload backup chores to idle servers, easily move files backed up to disk to a different server, and so on.

Consolidation and centralization. Microsoft has been running Exchange 2003 on Windows 2003 seven node clusters for many months now. Nearly all Exchange mailboxes at Microsoft are hosted on Windows 2003 clusters, with each cluster supporting 16,000 mailboxes of 200 MB each. You can learn more about Microsoft's Exchange 2003 infrastructure here:

NOTE: Clustering does not immunize Exchange against failures on a shared disk. Shared disks are virtual server resources, and will fail over with the rest of the Resource Group. Although Exchange services will continue to run, clients will not be able to access existing email until a shared disk problem has been corrected.

Clustering provides redundancy for Exchange as an application, but not for Exchange data. Therefore, regardless of whether you are clustering Exchange or running Exchange on a standalone server, it is critical to design disk systems for fault tolerance and redundancy.

What does clustering Exchange cost me?

The primary cost is training. Exchange administrators in your organization need to become familiar with Windows clustering if you are going to be successful with it. For a long time, this has been an intimidating subject to many people.

There is a wealth of documentation now available on Windows clustering. Windows 2003 clustering is mature and much easier to configure and learn than in previous editions. And you don't need to configure complicated hardware or buy expensive hardware to set up a cluster to test and learn on. It is now easy to configure a "lone wolf" Exchange cluster on a standard workstation with a SCSI or iSCSI disk.

A secondary cost may be additional hardware.

In a cluster with only two nodes, you can create two Exchange virtual servers. This is called an Active/Active configuration. Normally, you will run a single Exchange virtual server on each node, but you can run both virtual servers on a single node when necessary.

However, you must be careful not to overload either computer during normal operation. During a failure, each node has to handle the normal workload of both computers. This means that although both computers are utilized for Exchange, neither can be run to even half their carrying capacity. Thus, although both computers are running Exchange, you cannot load up each machine as much as you would if they were not clustered.

In a cluster with more than two nodes, there must be at least one passive node (Active/Passive configuration). A passive node has Exchange program files installed on it but does not normally run an Exchange virtual server. It is available for failover. The passive node requirement is enforced by allowing installation of one less Exchange virtual server than there are nodes in the cluster.

In a cluster with three or more nodes, you can run only one Exchange virtual server at a time on a given node. This means that in order to handle multiple virtual server failures simultaneously, you must have multiple passive nodes available. Why does Exchange enforce these limits?

Experience has demonstrated conclusively that an Active/Passive clustering configuration is both more reliable and more scalable than an Active/Active configuration. In general, an Active/Passive configuration can support more users, handle peak loads better and fail over more reliably than an Active/Active configuration.

While it may seem a "waste" to leave some nodes idle in an Active/Passive configuration, an Active/Active configuration leaves every node more than half idle. This is because you must leave headroom for doubling up server load after a failover. You can run each server in an Active/Passive configuration closer to its actual carrying capacity because the server will fail over to an idle node, not to a node that is already busy. Even in a two node cluster,

Microsoft's optimal recommendation for Exchange clustering is Active/Passive, even on a two node cluster. However, Active/Active is allowed on a two node cluster for backward compatibility with previous versions of Exchange.

Clustering Exchange 2003 on Windows 2003 is a very cost-effective and manageable way of increasing availability. Clustering Exchange will allow you to virtually eliminate downtime for scheduled updates and upgrades, and can protect you against sudden hardware failures. But clustering is not for every environment. You should carefully weigh the costs of clustering in your environment--both for extra hardware and for extra expertise.

Small environments may be willing to sustain periodic outages for maintenance and restoration, and thus may find that clustering is not worth the cost. In larger environments, availability and manageability take precedence. And, if you already have multiple Exchange mailbox servers and have already implemented a storage fabric suitable for clustering, then the additional costs for implementing clustering may be marginal.

Clustering is no longer an exotic technology, but something that every IT professional should understand. If you haven't had a great deal of experience with clustering, we hope this CXP Flash helped to demystify the subject and make you curious about it. If you have examined Windows clustering in the past, but weren't impressed, we encourage you to take another look. We think you'll like what you see.

For more information on Windows 2003 clustering, please see the following link:

Run, don’t walk but run away from clustered exchange servers. The clustered exchange server environment I administered never failed over without an error. Setting it up was a nightmare and my predecessor got fired because he couldn’t make it run. Veritas never ran correctly on the cluster. Finding support technicians that understood a clustered environment was difficult at best. After removing the cluster my system is working flawlessly.

The article writer mentioned the additional training and additional hardware costs but he forgot to mention the additional software costs. You must have multiple copies of Win 2003, multiple copies of Exchange Enterprise edition (standard won’t work), multiple copies of antivirus software and multiple copies of backup software. (Veritas asked me if I had two copies of the software every time I called them.)

My clustered file server worked well but I never trusted and DFS works better in a distributed environment. I Finally shut off all of my clusters last month and haven’t look back since.

You are right – there are definitely a lot of things that need to be taken into the consideration when thinking about clustering – there are hardware costs, software costs, training / understanding issues. All this can definitely be a problem if it is not understood in the 1st place, when going into clustering.

That being said – Exchange clustering can be set up to be very reliable and scallable. We have done it internally at Microsoft. We do not have a single mailbox server anymore that is not clustered. More information on this process can be found here if you are interested:

Not having servers on the cluster would not make the server consolidation story as feasible as it is now for us internally… so – while every implementation and solution can have issues (clustered or non-clustered) – there is a place for all of them, depending on what you need.

Clustering is definitely an option that must be taken with eyes wide open. The hardware and software requirements are different to stand-alone servers as you correctly pointed out. The training requirements are very different as the author pointed out, and must be undertaken before deploying such infrastructure. A "test rig from hell" should exist before any production deployment and should also be used for any change control, no matter how small, and this means more money replicating some of the hardware too. And no more "I’ll just quickly apply this hotfix roll up and then bog off home to a quiet weekend" – everything must be checked and tested first.

But when it’s all said and done, it can be sweet and resilient. I speak from experience, the first 9 months of active/active Exchange 2000 clustering (on MCS’ recommendation mind you) was absolute hell – most of the problems were due to the hardware though. Once we had that ironed out, and re-deployed our active/active clusters into a supported configuration after SP1 came out with different supported configurations – read no active/active unless you got gobs of memory and limited the amount of concurrent users; I got to the point where each one Exchange Virtual server could happily run 5 months with no scheduled downtime, let alone unscheduled, and then it would only be failed over for SP or hotfix patches. I’d say, if done right from day one, it can be very very good indeed.