Implementing Geographic Redundancy

Geographic redundancy is a fancy word for distributing your data across multiple data centers globally. One way to accomplish this is to spread your data across many SQL Azure data centers using Data Sync Service for SQL Azure to synchronize a hub database to many global member databases. SQL Azure has six data centers worldwide

In the future I will blog about a more complicated technique that provides real time writes and geographic redundancy with Data Sync Services.

Considerations

The technique that I will discuss in this blog post is best for databases with low writes and high reads where the write data doesn’t need to be synchronized in real time. One scenario that fits well with this technique is a magazine publishing web site that I discussed in this blog post.

The current CTP of Data Sync Service for SQL Azure allows the most frequent synchronization to be an hour. This limitation reduces the usefulness of this specific technique to a limited number of applications. The good news is that you will have an option for more frequent synchronization in upcoming releases of Data Sync Services.

Setting up Data Sync Service for SQL Azure

In order to get geographic redundancy, you will need to have multiple SQL Azure service accounts with SQL Azure servers that are in different data centers. Currently, you are only allowed to have one server per account, which means to have resided in multiple data centers, you need multiple account. The location of the data centers and the count of the member databases depend on the needs of your application. Having two databases, in two different data centers is the minimum needed for geographic redundancy.

The next step is to set up Data Sync Service for SQL Azure. With this technique, the hub database is the database that you can read and write from, and the member databases are the ones you only read (SELECT) from. Having the member databases read-only simplifies the issues of data integrity and synchronization conflicts.

Modifying the Data Layer

The final step is to add some intelligence to your data layer code that will pick a local datacenter relative to the Windows Azure web role first. If the local database is offline, and the code fails over to a remote SQL Azure datacenter. This code expands on the connection handling code that was discussed in this blog post.

Connection Strings

Using the setup above, you now have multiple read-only member databases distributed globally. This means that you have multiple connection strings that your application can use. As a design exercise, let’s assume these applications are Windows Azure web roles and that they could exist in the same data centers as the SQL Azure database.

In this technique I am going to dynamically construct the connection string every time a read query is executed against the database. I want my code to have these goals:

Return a connection string where the primary server is in the same datacenter as the Windows Azure web role is running. This will increase performance and reduce data transfer charges.

Return a connection string where the failover partner is in a different data center than the primary server. This will give me geographic redundancy.

Build on earlier code examples to try again at the local datacenter for transient errors, instead of calling the remote data center.

Have one server per datacenter that has matching data.

All the databases have the same name; this keeps the code simpler for this example, since we don’t have to maintain a list of databases name for each server.

Only try one remote data center. If the local data center and the random failover partner fail, abort the operation.

ConnectionStringManager

ConnectionStringManager class, shown below, knows about the data centers and servers that hold the data. It also knows how to read the user id and password from the configuration file. From this information, it can construct a connection string to a local SQL Azure database, or return a random failover partner to connect to remotely.

The ConnectionStringManager class code can be found in the download at the end of the blog post. The ConnectionStringManager class code looks like this:

The calling code builds on the connection handling code that was discussed in this blog post. Currently there is no way in Windows Azure to detect what data center you are in from your C# code. So we have to hard code the local data center in the code. This will need to be changed as you deploy the same code to other Windows Azure data centers. You could also move it to the Windows Azure configuration file and have you code read it from there.