This article intends to give you an overview of Cluster Continuous Replication; a high availability feature introduced in Exchange 2007. We’ll look at what CCR is, the benefits of implementing it and the requirements.

More and more organizations are realizing that e-mail is a mission critical application that they cannot do without. Hence, the need for their messaging systems to be available at all times is a strong one. The high availability features in Exchange 2007, such as CCR, allow for the protection of loss of service and loss of data.

CCR is a two node clustering solution that utilizes asynchronous log shipping and log replay features in Exchange 2007 as well as the failover features of the Windows cluster service. Asynchronous log shipping is the process of copying the log files over from the active node to the passive node only once the write operation has completed successfully on the active node. Log replaying is the process of committing the changes to the passive node's mailbox database as per the change log - the result is that, barring a slight delay for the replay operation itself, there will be an identical mailbox database on the passive node. In addition to log transferring features, CCR uses a witness file share on a third server that stores information that can be used if the active and passive node cannot communicate with each other over the private network. This witness file share can be installed on any server in the network and is not restricted by operating system or hardware requirements.

One of the most prominent benefits of using CCR is that there is no single point of failure (as there would be with other clustering options) because two copies of the mailbox database exist - one on the active node and another on the passive node. This makes backup more flexible because you do not need to allocate a time window specifically for backups - you can backup off the passive copy. Another plus point for CCR is the time that it takes to bring the server back online in the event of a failure. The failover process from the active to the passive node would take as little as a few minutes (up to a maximum of 4 minutes even on the largest of servers).

In the rest of this article, we'll look at how CCR works, the installation process, as well as advantages and considerations of Cluster Continuous Replication.

How does CCR work?

As we can see, both nodes have local storage (which can very well be a SAN or iSCSI device under the hood) and are connected by a private network used for "heartbeat" communication and a public network which is used to connect to by the clients. The witness file share is stored on a separate server (in this example the Hub Transport Server) and an Active Directory Global Catalog server provides user object information. All these elements are connected via a switch on the local network.

At a high level, the way CCR works is by utilizing a mechanism of continuous asynchronous log replication that copies the log files over from the active node to the passive node. As soon as the log file is finished being written to and "closed" on the active node, it is copied over to the passive node and committed (or replayed) to the database. This allows for an identical copy of the mailbox database found on the active node to be found on the passive node. A quorum model with a file share as a witness on a third server is used to prevent "split brain syndrome" which is basically when all designated servers that are responsible for transferring cluster information cannot communicate with each other and the nodes fail to receive heartbeat signals. Heartbeat signals are when the nodes notify each other that they are still alive - if a consecutive amount of heartbeat signals are not received by the passive node, then this is used as an indication that the active node has failed and acts as a trigger for the failover to take place. The heartbeat signals are configurable. Any data that is lost during an automatic recovery is subsequently recovered by a feature called dumpster transport, which keeps a queue on the Hub Transport server with a copy of every message that passed through it on its way to the clustered mailbox server in the CCR environment. When a failover occurs that results in loss of data, CCR will automatically request that each Hub Transport server in the CCR environment resubmit the messages from the transport dumpster queue.

Is the installation process complex?

The installation process itself is pretty straight forward, but there are a number of steps and pre-requisites that need to be considered. Below are the steps you need to take in order to install Exchange Server 2007 Cluster Continuous Replication on Windows Server 2008:

Plan and configure your network - be sure to take into account that you'll need two network cards on each node (one to cater for the public network and the other to cater for the private network)

Install and configure Windows Server 2008 failover clustering services. Once you have a solid failover clustering service in place, you'll have built the foundation for CCR.

Run the Exchange Server 2007 setup wizard and at the server role selection page choose the Clustered Mailbox Role (Active or Passive, depending on which server you are configuring) - as shown in the image below.

Repeat the above steps on both nodes (once for the Active node and another for the Passive node).

Advantages and Considerations

The advantages of CCR include the following:

No single point of failure

No reliance on shared storage

No need for third party products to achieve site resilience

No special hardware requirements needed (hardware does not need to be identical but each server should be listed on the Windows Server Catalog)

Allows for large mailbox support

Reduce backup time and total amount of data backed up

Asynchronous replication provides data redundancy

Improved ease of installation and management

The following considerations should be made prior to implementing CCR in your organizations:

CCR does not offer the same level of scalability as other clustering options. If you wanted to support three active clustered mailbox servers you would need to have six servers available (3 active / 3 passive).

In times of high activity, the database on the passive node might be a few minutes off from the active node due to the time it takes to copy the log files over and commit them to the passive database.

You can only have one database per storage group in a CCR environment.

Conclusion

Before I close off this article, I would like to share some useful references with those people that are planning to implement CCR in their environment:

As we have seen, implementing CCR can help organizations achieve reliability, dependability and protection at a reasonable cost - it helps protect you against a server hardware failure and the actual data itself.

Hopefully this article has given you a better understanding of what Cluster Continuous Replication is and what considerations you should make before deciding whether this high availability feature in Exchange 2007 is right for you.