SharePoint 2013 on SQL Server AlwaysOn, 2014 Edition

SQL Server 2014 is out bringing with it enhanced AlwaysOn clustering, and is indeed already supported by SharePoint 2013 with at least the SharePoint 2013 May 2014 cumulative update installed. Some new benefits in SQL Server 2014 are the reliability improvements for when connectivity is lost between nodes and also the ability to add a SQL Server instance hosted in Azure into the replica set for the ultimate off-site backup of your farms’ databases. Both of these upgrades are excellent news if you have to keep SharePoint safe; SQL Server is now more resilient than ever against failure and you can leverage that for SharePoint too.

Either way this new version of SQL Server is a good excuse to streamline my previous article on the subject as well as introducing the new features. Please bear in mind, this guide is not for multi-subnet SQL clusters as SharePoint has issues using them by default, even though it is possible.

Hosting Applications on AlwaysOn or Entire Farm?

It’s important to point out that there’s two ways of using SQL Server AlwaysOn; entire farm, or just the applications for applications shared between X farms via the replication that this technology uses, in a similar way to how SQL Server log-shipping works. Both high-availability strategies have pros & cons but for this guide we’ll do the entire single farm with all databases on the AlwaysOn cluster as that gives the most reliability.

SharePoint 2013 on AlwaysOn Setup Setups

This has been covered in greater detail before but just for convenience, here’s a quick & compact checklist for setting up an entire farm on AlwaysOn. As before, the process is basically the same:

Change recovery mode to “full” for databases that have simple recovery only.

All search databases.

All user profile databases.

Usage database.

o Backup all SharePoint databases.

Prepare SQL cluster

Create new AlwaysOn availability group (after enabling SQL Server for AlwaysOn on each node).

Add replicating partners, including instances in Azure if needed.

Create listener.

Use created AlwaysOn cluster with SharePoint

Update SQL alias on each server to use listener.

Test SQL failover with SharePoint.

Setup Steps with Screenshots

Once you’ve created all the databases that’ll be in the farm, the first thing we’ll need to do is convert any databases in “simple” recovery mode to “full” recovery as AlwaysOn doesn’t work.

All the search & the user-profile databases will need converting. All databases will need a full back-up too.

Next up; create the availability group. The wizard does a decent job of walking you through this.

Any problems with any databases will be shown here; it’s probably worth fixing anything before continuing. Now to pick end-points for the replication sets:

For the ultimate offsite backup, you can store databases in Azure (SQL Server instances hosted in Azure, not SQL Azure which is something else) just like any other instance.

Assuming you’re using x2+ standalone instances (as opposed to 2+ failover-cluster instances), you’ll be able to do a full initial sync with the wizard. In the guide before that wasn’t possible because each instance was itself a failover cluster, which makes this stage somewhat more hassle than normal even if it is a more solid design as it guaranteed each SQL instance uptime.

Click next & start validation…

You’re now ready to start the initial sync!

Once done, if it all went OK you should see this rather happy sight:

Remember, by default SQL Server uses port 5022 for the replication needed (configurable earlier in the wizard for each endpoint) – check firewalls allow this extra port. Also for consideration are the permissions – if SQL Server uses default accounts (not recommended) then each AD machine accounts will need to be added to the security of the other server, which isn’t easy considering the GUI doesn’t allow you to select AD objects of type “machine”. If the SQL Server service is set to use an AD user login and each node is using the same one then you should be set already.

Setup Listener and Reconfigure SharePoint

All we’ve done so far is setup an advanced mirroring system for the databases, basically. To fully use the power of SQL Server AlwaysOn we need an AlwaysOn “listener” too – this is basically an endpoint or address to connect to that transparently redirects the communication to the active SQL Server node, the point being you don’t have to care about what node is active.

In SharePoint land, this is relatively easy thanks to our SQL alias we’ve got setup that already does client-side aliasing for us; we just need to update this to point to the AlwaysOn listener instead of the SQL Server instance name (sql-n1) and we’re good to go.

Run “cliconfg.exe” again & once IIS & SPTimerV4 (SharePoint Timer service) has been stopped, update the alias, save and restart the services again. You should be good to go; the data’s exactly the same and SharePoint won’t even realise it’s using a different “endpoint”, but this is critical to make sure SharePoint can handle a cluster failover automatically.

Testing a Failover

As ever with high-availability setups, it’s difficult to be 100% sure of what’ll happen on a failover until it happens, such is the nature of these complex systems. That’s why manually failing over on a fairly regular basis is a good idea to do; to prove you still can should the time come that you really need to failover.

If you’ve followed this guide so far and done nothing else then when you failover to the other SQL instance, although the databases will be fine, SharePoint will die horribly as all SharePoint’s connections are denied to this new SQL Server, which makes sense given we’re only synchronising databases and not logins too.

Lot’s of errors about logins failing from the timer service and IIS.

Create the logins on the other SQL instances too; just on the server itself; no need to grant the users access to the databases as the permissions are already added by virtue of the database being replicated, but the server itself needs to know about the SharePoint service-accounts before SharePoint will work with this 2nd instance.

These users need to be created on every SQL Server instance that SharePoint may need to use, if you want SharePoint to survive a failover that is.

Failover Health Checks

If SQL Server has had a failover, there’s always a period when the client application (SharePoint) will lose its connectivity to SQL as everything’s automatically switched to a new SQL instance. This is pretty much inevitable to some extent at least but it’s good to know how to make sure everything’s failed-over correctly, especially on multi-subnet clusters where there may be multiple DNS servers. Failovers are complicated thing and some basic checks are always recommended just to make sure everything worked ok.

DNS update – the name of the listener will have a new IP address after a failover.

Check for new A-record with “nslookup [listener_name]” replacing “[listener_name]” with the name of the AlwaysOn listener. You should see the new active nodes’ IP address as the value of the “A” record

You may need to flush DNS cache with “ipconfig /flushdns” so the SharePoint machine picks up the new A-record.

Restarting IIS/SPTimerV4 can help SharePoint pick-up the new address.

Database health:

Make sure the right primary node thinks it’s the primary – refresh the availability group to check.

Also make sure each database says “Synchronized” on the new active node too (it should say the same on the passive node(s) too).

SQL Server instance health – get the (now passive) node back in the group & re-synchronised again – verify that happens.

If for some reason the passive node can’t join, it may be necessary to remove the server from the availability group and re-add it.

Performing a planned failover is highly recommend on a regular basis, during a maintenance window (as it always causes disruption). It’s the only way to be 100% sure that the failover will be successful, which is the entire point of doing this. The smallest misconfiguration can cause the cluster to fail and it’s only by proactively testing it for real can we be sure it’ll work when it counts – when a fatal error occurs and the whole system needs to cope.

Wrapping Up

Never in the history of databases has it been so easy to setup a cluster that’s so reliable before, which is one brilliant thing about SQL Server. The new 2014 version improves this reliability even more-so; let’s use it to make SharePoint more reliable too!

The TTL should be controlled in your Windows DNS Server and defaults to your zone TTL. In the MMC window, click View – Advanced, then navigate to your availability group DNS record. Open properties and try decreasing the TTL to your desired setting (or 0).

The preferred approach would probably be to set the MultiSubnetEnabled property to "True" for all of your databases except SharePoint_Config. Then, find the connection string in the registry on your WFE boxes for SharePoint_Config and add the MultiSubnetEnabled=true parameter. Now your servers will try to connect to all IP addresses simultaneously, and the TTL will not matter.

As Sam pointed out, SharePoint Support doesn't technically support modifying the connection string in the registry, but it is a really minor modification and a workaround for not having the option in the product configuration wizard.

Thanks for the input Phillip and you're right. Not supported just means we've not extensively tested that scenario & rubber-stamped it, but it's quite likely it would work. If you need an absolute guarantee your setup will work (or otherwise described as "supported") then don't modify the registry & go with DNS TTL changes instead.

No worries Sam, I didn't fully appreciate that this article refers to Azure for off-site backup. I did check and it definitely is not supported. So in this setup can your whole backup and restore solution, farm to item level recovery, be taken care of by an SQL Instance in Azure? Or would you still need an on-premise solution as well?

I may be a bit late with this (having only just discovered this myself).

When setting up AO you can simply use the following commands per database:
Add-DatabaseToAvailabilityGroup -AGName -DatabaseName
This does multiple steps:
1. Configures the AoAG with the new database and sets up the secondaries automatically (including the backup and restores).
2. Repoints the references for that database to the new database name (no need for SPDatabase.ChangeDatabaseInstance()) (It even adds the MultiSubnetFailover=true part to the connection string)
3. Informs SharePoint that this database is part of an AoAG.
Bonus step 4. If you perform this on the configuration database it also updates the local system registry keys to point to the new location, no need for SQL aliases. It even warns you that doing this on the configuration database will require rebooting every server to pick up the new connection strings.
For reference: https://technet.microsoft.com/en-us/library/cc512725.aspx

So, with my testing, I managed to set up a SharePoint farm on one SQL instance which hosts an AoAG using an alias to point to the primary server directly (not via the listener). Using the above command, added the databases one by one and all references are updated correctly (rebooting after updating the config database as well).

What’s more, you get the Get-AvailabilityGroupStatus command to give you the details and you can even use the object returned from that command to call a method to initiate a failover from SharePoint itself.

It’s really a pity that this particular configuration is not more widely known. Even SPDatabase.ChangeDatabaseInstance() will successfully remap the confiig database’s connection string (albeit with no warning the farm dies until you reboot unlike the Add-DatabaseToAvailabilityGroup command.

When using AoAG, there’s really no need to use SQL Aliases for anything. You can even move servers without having to modify the connection strings just by virtue of being in an AoAG (add the new one, failover, remove the old one).