Every sysadmin runs into the problem at some time; switching to a newer version of Exchange. Hopefully most of you can migrate to Exchange 2010 within the forest. However sometimes it just makes more sense to setup a new forest with the new version of Exchange, SharePoint, etc. In my case it makes more sense. It takes a lot more work, but in the process I'm able to update all the servers to Windows Server 2008 R2 as well. Having all the servers on the same version of Windows saves me time on management. I'm getting side-tracked! Let's get back to Exchange and the problem at hand: DAG Failover with two Exchange 2010 Servers.

If you haven't read up on the functioning/existence of Database Availability Groups (DAG) and CAS failover of Exchange 2010, you'll probably think that it's a breeze. However that's not the case. The new failover isn't really build on a two server setup. DAG uses Windows' Failover Clustering to provide failover on the Mailbox Database level. This work really well, but comes with one huge disadvantage. Failover Clustering is not compatible with Network Load Balancing (NLB) and NLB is used for failover of the Client Access Server (CAS) role. As an alternative one could use a hardware or software load balancer that load balances TCP/IP traffic, but those don't come cheap, which doesn't really make sense for the smaller shops. But a solution is near!

The solution

After a lot of thinking, discussing and experimenting I came up with a solution. While using the standard Windows Failover Clustering for DAG I can use the Client Access Server Array (

Get-ClientAccessArray

) without NLB for failover of the CAS role. However instead of having NLB switching the active server I'll have to script which server is active. My current default answer for scripting and automation applies here: "Let's PowerShell it!".

First I tried to change the RpcClientAccessServer directly, but that didn't have the right effect. A colleague suggested to use the CAS array and just activate the CAS array IP on the active server. This made a lot of sense as NLB does something similar. So let's go through the steps!

Create the A record in DNS for your newly created CAS array and have it point to an available IP that can be used by both Exchange servers.

Add the CAS array IP on one of the available network adapters of the active server by using the command
[Powershell]netsh in ip add address "Adapter Name" 192.168.0.xxx 255.255.255.0[/Powershell]

You're not done yet! You can connect an Outlook client to a mailbox. AutoDiscover should now use the CAS array DNS as the connection point. You can check the connection point by right-clicking the Outlook taskbar icon while holding the CTRL button and selecting Connection Status.

If you don't see the right hostname in the server name field, you should check the results of the AutoDiscover. You can use the option Test E-mail AutoConfiguration in the CTRL + right-click menu of Outlook or you can use the website testexchangeconnectivity.com to test you AutoDiscover results. You can use this site to test almost any aspect of your Exchange connectivity. You should get something like the following as a result.

The magic script

Time to implement the PowerShell script. The script will take care of the CAS failover by activating the IP on one of the Exchange 2010 servers. The script uses ping to make sure that the other host is still reachable. I've scripted it to check if it can ping the gateway before doing a CAS failover just to make sure that it's not a network-wide issue. It will also check the database copy status (Get-DatabaseCopyStatus) to make sure that the mailbox database has also done its failover to the host running the script.

There are a couple of variables that you need to set in the script before you can run it properly. There should be enough comments in the script to figure it out, otherwise just leave a comment and I'll be sure to answer! You might also want to change the text of the e-mail that is being send in case of failover.

There are a couple of settings that you have to edit in the script to customize it for your environment and preferences. I won't go into this any further as it's quite well explained in the script itself. If you do have questions please comment on this post and I'll get back to you as soon as I can.

Creating the script account

Create a new domain user and add it to the View-Only Organization Management role using the Exchange Control Panel (ECP). You can access the ECP by going to https://<servername>/ecp/. This however doesn't provide it with permissions to allow remote PowerShell. You can grant the permissions by running

Set-User FailMon -RemotePowerShellEnabled $True on the Exchange Management Shell (EMS).
<img alt="" src="http://www.sysadminsblog.com/wp-content/uploads/2010/07/070710_1725_Fullfailove1.png" />
Also add the domain user to the local administrators group to give it the appropriate permissions to run the task on the right level.
<h3>Scheduling the script</h3>
To have the task start, you can use the task scheduler. The script can be fired as much as you like, as it will only spawn one instance.
powershell.exe -command "&amp;{C:\Scripts\FailoverMon.ps1}"

Make sure that, while creating the scheduled task, you select an account with the appropriate permissions on your Exchange organization.

I've set my task to trigger (as shown) on startup with a repeat of every 10 minutes.

If you want to know more on how to schedule a PowerShell script, please check here.

Testing

When the tasks are scheduled and running on both the servers you can try a failover by shutting down the server which currently has the CAS array IP address. You should see it failover with the DAG and you should also see that the IP address is added to the other server. The client will get a notification on Outlook that the administrator made changes to the configuration and that Outlook need to restart to work properly.

Failback

Failback is still a manual process if you want to failback to the previous server. To do so, you'll have to make sure that the previous server is configured with the CAS array IP address and that the DAG indicates that it's healthy. Then it's just a matter of manually removing the CAS array IP address on the server that doesn't need it anymore. Then it'll automatically detect that it's switched servers and the above dialog is again presented to the client. You can use the following command to remove the IP address from the server.