Datacenter Activation Coordination (DAC) mode - Part 1

Tariq is a senior Microsoft Systems Engineer. He implemented several Microsoft infrastructure projects for various major companies. He is now focusing on Active Directory and Exchange server administration and implementation.

Today we discuss Datacenter Activation Coordination, and see how this is used to prevent mounting a mailbox database that is currently active and mounted at another Database Availability Group (DAG) member. In other words, preventing the split brain syndrome.

Datacenter Activation Coordination (DAC) is used in Exchange 2010 to determine whether or not a mailbox server in a Database Availability Group (DAG) is allowed to mount a mailbox database. This activation is controlled by a bit stored in each DAG member memory; a bit set by the Datacenter Activation Coordination Protocol (DACP).

To make things easier let's use the following scenario:

We have two sites, Primary and Secondary. A DAG (DAG-01) stretches over these two sites, where Primary is active and Secondary is passive. We also have four mailbox servers (MBxP01, MBxP02 in the primary datacenter and MBxS01, MBxS02 in the secondary datacenter) as shown in the following diagram:

In case of a database failure, a database copy will be selected, activated and mounted on another DAG member. As discussed in Active Manager – The Exchange 2010 High Availability Brain, the Best Copy Selection process is applied in order to select the database copy to be activated. If DB1 on MBxP01 failed, then the DB1 copy on MBxP02 is automatically activated.

Figure 2: Mailbox database failure

If the whole server failed, then all mailbox databases would failover to the second mailbox server, MBxP02.

Figure 3: Mailbox server failure

In the previous two cases, when the server MBxP01 and database DB1 get fixed, the databases are employed again as copies.

Now, let see what happens in case of a site failure. Here is when the DAC mode turn comes.

If the Primary site failed and we are unable to recover it for a long period of time during which our organization cannot tolerate service outage, we have to manually switchover to the secondary datacenter. If our organization was able to wait until the service comes back at the primary datacenter of course there would be no need to activate the secondary datacenter. Note that this activation is done manually.

Figure 4: Site failure

The process of datacenter switchover involves the following:

First of all, check the health and readiness of the secondary (disaster recovery) site infrastructure.

If the messaging infrastructure at the primary site is not fully down; i.e. some servers are still running but the databases cannot be activated or mounted, then the active manager on the DAG members at the primary datacenter must be marked as stopped. This prevents them from mounting the databases after these are fixed.

Configure the DAG to use an alternate witness server at the secondary datacenter and mount the databases.

Other configuration changes may be needed like changing the MX records that point to the failed datacenter servers and other DNS records for Client Access Server (CAS), HUB transport and Unified Messaging (UM) servers.

DAC Mode is disabled by default for a DAG. So let's consider the case when DAC is not in use. What happens if the primary datacenter is fixed and the servers come online before network connectivity between both datacenters is restored?

Since the DAG members at the primary datacenter cannot contact other DAG members at the secondary datacenter, the primary datacenter DAG members will mount the databases because they form a majority, (5/2)+1 which is 3 members (two mailbox servers and the witness server). If this happened then we would have the databases mounted in both sites, this is what is called "Split Brain Syndrome". To avoid this, we must configure the DAG to run in Datacenter Activation Coordination (DAC) mode.

When DAC mode is enabled, each time the mailbox server starts up it checks the memory bit to determine whether or not it is allowed to mount the databases. This is the Datacenter Activation Coordination Protocol (DACP) bit. When the mailbox server starts up, the active manager on this server will set the DACP bit to 0. This means: "I'm not allowed to mount my active mailbox databases". Active manager will try to communicate with all other DAG members asking them for their DACP bit.

If no one responds with a 1, then this starting up server will not mount its mailbox databases. If a DAG member responds with a DACP bit of 1, mailbox databases are mounted on it. Responding with a 1 means "We are all running without problems, we are not facing a site or network failure, you can join and mount your mailbox databases". On the other hand a 0 response means "Wait! We cannot contact all our DAG members, we are not forming a majority, and therefore we can't mount our databases".

In case of a DAG with just two mailbox servers and a witness server, the starting up mailbox server will also use the witness server boot time to determine whether it is allowed to mount its active databases. If the mailbox server sets its DACP bit to 1 before the witness server boots up, then the mailbox server will not mount its active mailbox databases, it will assume that a failure happened to both servers or that they have been restarted at the same time. If the mailbox server sets its DACP bit to 1 after the witness server boots up, the mailbox server itself will assume that it has been rebooted for some reasons and it will be allowed to mount its active mailbox databases.

We will talk about the steps required for a datacenter switchover in more details in the next part of this topic.

Summary

In this article we saw how Datacenter Activation Coordination (DAC) mode works to prevent us from mounting databases that are already active and mounted on another server. We also described the three types of failures that may occur in a messaging infrastructure (mailbox database, server, or a full site failure). Finally we had a look at the steps required to perform a datacenter switchover following a site failure.

User Comments - Page 1 of 1

If I move all the active databases from the primary datacenter to the standby datacenter first, then perform the following steps below. Do you think the mailbox servers in the primary datacenter will mount the database, assuming that the DAC is enabled?

(1) Stop the DAG in the primary site

(2) Activate the DAG member servers in the standby site. This step will evict all the nodes in primary site. Also it will stop and disable the cluster service on the mailbox servers in the primary site.

(3) Transfer the file witness server from primary datacenter to the standby datacenter.