How to: Configure SQL Server 2012 AlwaysOn – Part 6 of 7

How to: Configure SQL Server 2012 AlwaysOn

How to: Fail-over

In this part of the blog series we will go over the various methods of manually failing over an Availability Group. Naturally if you are using automatic fail-over then there are no steps required for this to happen in an emergency but you might find a reason to conduct a planned manual fail-over when there is no actual problem (for example, fail-over testing).

What Happens During a Fail-over

Database Component:

A fail-over occurs by sending what amounts to a “make primary” command to a secondary replica. This happens because, of course, the primary database might not be available to receive commands.

Once the command to make primary comes through, the database will become transactional (in a read/write mode) and begin accepting connections of all types. If you were using synchronous commit then there will be no data loss and this new primary replica will begin stock piling transactions in the transaction log until the previous primary comes back online to receive the transactions or data transfer is manually severed. When the primary replica comes online there are no user actions required to resume the data transfer but a fail-back would require user interaction if it was desired.

If you are using asynchronous commit (like we are in our architecture) then the secondary replica will come back online but the data transfer will be paused. At this time you can use the database as a read-only replica to identify what data loss occurred or you can resume transfer. By resuming the transfer you lose the ability to identify what data might have been lost during the fail-over.

Cluster resource / AG listener

At the same time that the database component is failing over the cluster resource is also working on its fail-over.

Which ever secondary replica is becoming the new primary now needs to run the cluster resource for the Availability Group. The most important part of this fail-over is the AG listener. When crossing within the same sub-net the move causes a momentary outage of the IP address associated to the listener (no longer than the outage incurred by the database fail-over). If the resource is moving to a node where the listener must change sub-nets it will take the former sub-net IP offline and bring online the new sub-net’s IP address.

Availability Group Fail-over Wizard

The availability group fail-over wizard can be run from either the primary or secondary replica but you will have to be able to connect to the secondary replica in order to be successful.

There are two ways to access the fail-over wizard:

Availability group dashboard

The Start Failover Wizardlink is available from both the primary and secondary replica dashboards.

SSMS Object explorer

To access the fail-over wizard from the Object Explorer; right click on the availability group and select Failover…

Wizard Walk-through

Upon opening the availability group fail-over wizard for the first time you will meet the Introductionpage. If you’d like, check the Do not show this page againthen click Next.

On the Select New Primary Replicapage you will see a list of all available replicas with some status information. I find the most important column of this page to be the Failover Readinesscolumn which will let you know whether you will have data loss by choosing this replica for fail-over.

NOTE: The Failover Readiness column is indicating data loss based on configuration not based on an actual representation of what has been synchronized. If you created a database inserted one record, verified that the secondary has the record, and then ran this wizard you would see that data loss is listed for asynchronous connections. It does not matter that there will actually not be any data loss in this situation.

Once you have selected your replica to fail-over to; click Next.

The Summarypage will provide a list of the actions and affected components. This is where you can script out the process or go ahead and click Finish.

Finally, you reach the Resultspage and you are complete.

T-SQL Fail-over

Microsoft did a really great job at keeping the fail-over scripts simple for moving an Availability Group.

These commands must be run on the secondary replica that you wish to become the primary replica.

When no data will be lost by the fail-over

ALTER AVAILABILITY GROUP [AG_SQLTest] FAILOVER;

When data might be lost by the fail-over

ALTER AVAILABILITY GROUP [AG_SQLTest] FORCE_FAILOVER_ALLOW_DATA_LOSS;

Wrap-up:

It is easy to see that both the Availability Group Fail-over Wizard and the T-SQL methods are both very straight forward and easy to use. That is one characteristic of AlwaysOn that is very appealing, especially to junior DBAs.

I always prefer x64. The reason is fairly simple, it supports more memory. In addition to having more of your database on solid state rather than reading from disk, you also see improvements in concurrent users. No-a-days there are always x64 capable servers around and they are no more expensive than x86. Below is a link to a quick, to the point, blog post on why to go with x64.

Also note, AlwaysOn does not support non-WOW64 x86 systems. Meaning if your Windows is running x64 then you have to run SQL Server x64 for AlwaysOn to work. Only a x86 version of Windows can support SQL Server x86 with AlwaysOn.

I have a question about setup. I have 3 nodes. The one node is in azure only for DR purposes. For that node do I need to set up any routing? I have it set to manual fail over and no for read only and no routing. But wondered if failed over to the azure node. when the others come back up how would they sync up.

If you have everything functioning right now, the direction of the data sync will reverse but not much else will change. Your on-premises replicas will sync back up with the Azure replica and then you can fail back to on-premises whenever you feel like it. If you have not yet configured the cluster, be aware that you will need an Azure Internal Load Balancer to work with the listener and health probes will need to be setup for that. I personally have built AGs within Azure but not a hybrid approach so I am not sure if there will be additional network complexities or not. To wrap up by directly answering your question, you should not need any additional routing to handle the data sync.

The onsite location will be down for a day and the DR site will be primary, per day transaction log on average is 1.5 gb. My Process are below…Please advice if it makes sense as I dont have enough exoperience with AAG

1) Change the Availability Mode to Syn Comit on the DR then make sure its synchronised and then failover

2) Change the other two secondaries to asynchronous commit and the failover mode to manual in order to avoid application lag and automatic failover

.3) Do the maintenance on the secondaries

4) Change the Availability Mode to Synchrous Commit on the secondaries , make sure its synchornized and then failover

5) Upon failover change the DR to Asynchronous commit and failover mode to Manual.