The Challenges of Clustering with Exchange Server

The clustering survey included the question, “What has been the most difficult part about implementing MCS?” Exchange Server came up frequently in the responses—one reader simply replied, “getting Exchange to fail over right.”

Microsoft’s Kurt Friedrich, product unit manager for clustering, understood this concern and talked about the problems associated with clustering Exchange. “Part of software complexity is when you have a cluster running, it’s really not doing any useful work. What application are you going to cluster? If the answer is you’re going to cluster a file server, setting up a file server cluster is a 1-minute operation—and a high percentage of our clusters are file servers and print servers. If you want to go with a SQL Server database cluster, it gets a little more complex because you have database considerations. If you want an Exchange installation, not only does it get much more complicated, but you need to know about Active Directory (AD), which is keeping track of who the mail users are. So anybody can set up a file server cluster in about a minute, but to set up an Exchange environment you have to be trained for mail and AD.”

What advice would Kurt and Ryan Rands, senior product manager for enterprise abilities, give to people having a hard time with Exchange? Ryan said, “Clustering still requires a little different skill set than simply administering Windows servers. If I could give one small piece of advice, it’s that you need to build a different skill set. The day-to-day operations you do, such as managing services and setting up file shares, are just different for a cluster server, so there’s a little bit of a training curve to get up to speed.”

What are some of the special skills you need for Exchange clustering? “Typically, if a service is misbehaving, you go into the service control manager and stop it, or restart it, or whatever you need it to do,” Ryan replied. “If you do that in a clustered environment, the cluster server is monitoring that service and when it sees that service go away, it will fail over because that service failed for some reason. So that’s just a quick example of the operations that are just different from what you’d do on a normal server.”

Kurt added, “Another example, and probably the biggest learning curve, is that when you install Exchange on a typical system, you don’t think about the disk drives and the Internet addresses. They just come with the server. But on a cluster, when you fail over an instance of Exchange, you have to take with it all the things it needs to run on the other node. There’s a thing called a resource group. So this learning curve involves understanding the concept of a resource group so you can specify what goes with it when Exchange fails over. For example, which physical disk drives have the Exchange storage (because you don’t fail over all the drives; you only fail the ones Exchange is using), what virtual IP address are the clients coming in at (that has to be failed over), whether there is a dependency on the transaction manager (does that have to be failed over?), etc. So, on the property pages, you need to understand the concept of specifying all the things that go when Exchange goes. That is part of this learning curve.”