Part 1: My databases do not automatically mount after I enabled Datacenter Activation Coordination

Datacenter Activation Coordination (DAC) mode is a property of a Database Availability Group (DAG) that, when enabled, forces starting DAG members to acquire permission in order to mount databases. Administrators can enable DAC mode at any time after the DAG has been created. DAC was designed specifically to handle the following scenario:

You have a DAG extended to two datacenters.

You lose the power to your primary datacenter, which also takes out WAN connectivity between your primary and secondary datacenters.

Because primary datacenter power will be down for a while, you decide to activate your secondary datacenter and you perform a datacenter switchover.

Eventually, power is restored to your primary datacenter, but WAN connectivity between the two datacenters is not yet functional.

The DAG members starting up in the primary datacenter cannot communicate with any of the running DAG members in the secondary datacenter.

In this scenario, the starting DAG members in the primary datacenter have no idea that a datacenter switchover has occurred. They still believe they are responsible for hosting active copies of databases, and without DAC mode, if they have a sufficient number of votes to establish quorum, they would try to mount their active databases. This would result in a bad condition called split brain, which would occur at the database level. In this condition, multiple DAG members that cannot communicate with each other both host an active copy of the same mailbox database. This would be a very unfortunate condition that increases the chances of data loss, and make data recovery challenging and lengthy (albeit possible, but definitely not a situation we would want any customer to be in).

Once DAC mode is enabled, the integrated datacenter switchover tasks (Stop, Restore and Start-DatabaseAvailabilityGroup) are also enabled.

DAC mode works by using a bit stored in memory by Active Manager called the Datacenter Activation Coordination Protocol (DACP). DACP is simply a bit in memory set to either a 1 or a 0. A value of 1 means Active Manager can issue mount requests, and a value of 0 means it cannot.

The starting bit is always 0, and because the bit is held in memory, any time the Microsoft Exchange Replication service (MSExchangeRepl.exe) is stopped and restarted, the bit reverts to 0. In order to change its DACP bit to 1 and be able to mount databases, a starting DAG member needs to either:

Be able to communicate with any other DAG member that has a DACP bit set to 1; or

Be able to communicate with all DAG members that are listed on the StartedMailboxServers list.

If either condition is true, Active Manager on a starting DAG member will issue mount requests for the active databases copies it hosts. If neither condition is true, Active Manager will not issue any mount requests.

In order for the DACP bit to be set to 1 (mount database allowed) the starting DAG member must also be a member of the DAG’s cluster, and the cluster must have quorum.

For a variety of reasons, an administrator may need to shut down all members of a DAG. When starting up a DAG in DAC mode after a complete shutdown, databases may not mount automatically as they would if DAC mode were not enabled. This behavior may sound confusing but it is actuality by design. Let me explain why.

To illustrate the scenario I will shut down all DAG members without manually dismounting or moving any databases. I will leave the witness server online and accessible.

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status

Name : DAG-1-DB0\DAG-1 Status : ServiceDown

Name : DAG-DB0\DAG-1 Status : ServiceDown

Name : DAG-DB1\DAG-1 Status : ServiceDown

Name : DAG-2-DB0\DAG-2 Status : ServiceDown

Name : DAG-DB1\DAG-2 Status : ServiceDown

Name : DAG-DB0\DAG-2 Status : ServiceDown

Name : DAG-DB0\DAG-3 Status : ServiceDown

Name : DAG-DB1\DAG-3 Status : ServiceDown

Name : DAG-DB0\DAG-4 Status : ServiceDown

Name : DAG-DB1\DAG-4 Status : ServiceDown

I’ll start by powering on DAG-1. Since DAG-1 and the witness server do not have a sufficient number of votes to achieve quorum (3 votes are necessary for quorum); therefore DAG-1 won’t be able to mount any databases.

Attempts to get the status of the DAG members using get-databaseavailabilitygroup –status fails with an error due to the cluster service not being initialized on the node.

Get-mailboxdatabasecopystatus * also reports all databases on DAG-1 as dismounted. All other nodes report service down.

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status

Name : DAG-1-DB0\DAG-1 Status : Dismounted

Name : DAG-DB0\DAG-1 Status : Dismounted

Name : DAG-DB1\DAG-1 Status : Dismounted

Name : DAG-2-DB0\DAG-2 Status : ServiceDown

Name : DAG-DB1\DAG-2 Status : ServiceDown

Name : DAG-DB0\DAG-2 Status : ServiceDown

Name : DAG-DB0\DAG-3 Status : ServiceDown

Name : DAG-DB1\DAG-3 Status : ServiceDown

Name : DAG-DB0\DAG-4 Status : ServiceDown

Name : DAG-DB1\DAG-4 Status : ServiceDown

Next, I’ll boot DAG-2. The addition of a second DAG member server allows quorum to be achieved. However, Active Manager on DAG-2 is unable to contact another DAG member that has a DACP bit of 1, and it can’t contact all of the DAG members on the StartedMailboxServers. If DAC mode was not enabled for this DAG, databases would have automatically mounted. But because DAC mode is enabled, the databases do not automatically mount.

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state

Name : dag-1 State : Up

Name : dag-2 State : Up

Name : dag-3 State : Down

Name : dag-4 State : Down

Using get-databaseavailabilitygroup –status will return the same error as previously recorded.

Using get-mailboxdatabasecopystatus * we can confirm that databases remain dismounted on server DAG-1 and copies on server DAG-2 failed.

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status

Name : DAG-1-DB0\DAG-1 Status : Dismounted

Name : DAG-DB0\DAG-1 Status : Dismounted

Name : DAG-DB1\DAG-1 Status : Dismounted

Name : DAG-2-DB0\DAG-2 Status : Dismounted

Name : DAG-DB1\DAG-2 Status : Failed

Name : DAG-DB0\DAG-2 Status : Failed

Name : DAG-DB0\DAG-3 Status : ServiceDown

Name : DAG-DB1\DAG-3 Status : ServiceDown

Name : DAG-DB0\DAG-4 Status : ServiceDown

Name : DAG-DB1\DAG-4 Status : ServiceDown

If the administrator attempts to mount a database an error will be displayed that the nodes either do not have quorum or automount consensus has not been reached.

Next, I’ll boot DAG-3. As with DAG-2, although quorum is achieved, databases will not be automatically mounted. DAG-3 is unable to contact another server with a DACP bit of 1 or all of the servers on the StartedMailboxServers list.

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state

Name : dag-1 State : Up

Name : dag-2 State : Up

Name : dag-3 State : Up

Name : dag-4 State : Down

Using get-databaseavailabilitygroup –status will return the same error as previously recorded.

Using get-mailboxdatabasecopystatus * we can confirm that databases remain dismounted on server DAG-1 and copies on server DAG-2 failed.

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status

Name : DAG-1-DB0\DAG-1 Status : Dismounted

Name : DAG-DB0\DAG-1 Status : Dismounted

Name : DAG-DB1\DAG-1 Status : Dismounted

Name : DAG-2-DB0\DAG-2 Status : Dismounted

Name : DAG-DB1\DAG-2 Status : Failed

Name : DAG-DB0\DAG-2 Status : Failed

Name : DAG-DB0\DAG-3 Status : Failed

Name : DAG-DB1\DAG-3 Status : Failed

Name : DAG-DB0\DAG-4 Status : ServiceDown

Name : DAG-DB1\DAG-4 Status : ServiceDown

If the administrator attempts to mount a database an error will be displayed that the nodes either do not have quorum or automount consensus has not been reached.

At this point, all nodes are a member of a cluster that has quorum, and DAG-4 can contact all servers on the StartedMailboxServers list. Therefore, the DACP bit on DAG-4 is set to 1.

DAG-1, DAG-2, and DAG-3 can now contact a server with a DACP bit set to 1, and therefore they set their DACP bit set to 1.

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state

Name : dag-1 State : Up

Name : dag-2 State : Up

Name : dag-3 State : Up

Name : dag-4 State : Up

Using get-databaseavailabilitygroup –status we can see that the DAG has successfully initialized, all nodes are operational, and a primary active manager has been initialized.

Using get-mailboxdatabasecopystatus * we can observe that databases have now automatically mounted and copies are healthy.

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status

Name : DAG-1-DB0\DAG-1 Status : Mounted

Name : DAG-DB0\DAG-1 Status : Mounted

Name : DAG-DB1\DAG-1 Status : Mounted

Name : DAG-2-DB0\DAG-2 Status : Mounted

Name : DAG-DB1\DAG-2 Status : Healthy

Name : DAG-DB0\DAG-2 Status : Healthy

Name : DAG-DB0\DAG-3 Status : Healthy

Name : DAG-DB1\DAG-3 Status : Healthy

Name : DAG-DB0\DAG-4 Status : Healthy

Name : DAG-DB1\DAG-4 Status : Healthy

As I’ve described above, when a DAG in DAC mode is started after a complete shutdown, databases will not be mountable until all DAG members are up, running, and in communication with each other.

*Special thanks to Scott Schnoll for reviewing and editing content.

========================================================

Datacenter Activation Coordination Series:

Part 1: My databases do not mount automatically after I enabled Datacenter Activation Coordination (http://aka.ms/F6k65e) Part 2: Datacenter Activation Coordination and the File Share Witness (http://aka.ms/Wsesft) Part 3: Datacenter Activation Coordination and the Single Node Cluster (http://aka.ms/N3ktdy) Part 4: Datacenter Activation Coordination and the Prevention of Split Brain (http://aka.ms/C13ptq) Part 5: Datacenter Activation Coordination: How do I Force Automount Concensus? (http://aka.ms/T5sgqa) Part 6: Datacenter Activation Coordination: Who has a say? (http://aka.ms/W51h6n) Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover. (http://aka.ms/Oieqqp) Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG... (http://aka.ms/Uzogbq) Part 9: Datacenter Activation Coordination: An error cause a change in the current set of domain controllers (http://aka.ms/Qlt035)

Yesterday my primary/active database server had some issues and I moved all the active database to passive server , once I moved all the client get disconnected but when I look at the Rpc client access server pointing correct CAS server. Also when I look at the PAM it shows me problematic server and I changed it to correct server and I restated CAS server and issue got resolved.

So my question is this issue got resolved due to PAM change or CAS server restart?

As per my understanding issue got resolved due to PAM change not because of CAS server reboot this issue may occur because Cluster status change was not updated in cluster database. for more details we need to dig into ogs of cluster and compare the time stamp to know more.

SLO

17 Jul 2013 11:33 AM

What if you do not loose power at primary datacenter but only the internet line?

This means you have no WAN connection to secondary datacenter, but you still want to do a datacenter switchover in order to service uses at other locations?

I was not able to activate databases when this happened at our primary datacenters. I only have 1 node in each datacenter and the primary holds all active databases. When trying to activate databases from the node in secondary datacenter I got errors saying the DAg must have quorum and that the cluster service is not running. All databases on this node showed "disconnected and healthy" in copy status.

I am considering enabling DAC but I am not sure it makes sense considering, I have 1 node in each datacenter and the conditions of communication between the nodes is needed. So if WAN is gone in my case, what to do and how can I activate the secondary datacenter node?

I think your using Active/Passive scenario, If so, when WAN connection dropped on primary datacenter still all the server are up and running successfully but only external communication emails will drop, so why do you need to activate passive database copy in secondary datacenter due to WAN failure.

We need to active passive database copy on secondary datacenter when primary datacenter is unavailable.

So you can NAT public IP address of secondary datacenter to primary edge server (if you have edge server, if not any gateway server or hub transport server). which will route all external email communication through secondary datacenter ISP intensely.

SLO

18 Jul 2013 2:25 PM

Hm I was sure I replied yesterday but here goes again.

I need to activate the passive copies since I have a lot of users globally using the databases. Locally the exchange users can work at primary center 1 (without external email communication), but I need all other locations to be able to connect. but since their line is down at primary datacenter, they cant.

I don't understand why I could not activate the passive. And also would I need DAC, when I only have 1 node in each datacenter?

Most likely one DAG then is not the answer to your solution. Essentially what you are saying here is that I want to cause a condition where split brain is forced. In general I would discourage doing that. If you have the witness in one server in site A, and another node in site B with a database on it - the site B database is going to site A if a WAN failure occurs.

Two DAGs would give you the ability to have each side run independently.

To your question about DAC - DAC should always be used when nodes are geographically dispersed where two sides may find themselves with quorum UNDER NORMAL site level activation scenarios.