The Sheep-Pen of the Shaun

News

Shaun, the author of this blog is a semi-geek, clumsy developer, passionate speaker and incapable architect with about 10 years’ experience in .NET and JavaScript. He hopes to prove that software development is art rather than manufacturing. He's into cloud computing platform and technologies (Windows Azure, Amazon and Aliyun) and right now, Shaun is being attracted by JavaScript (Angular.js and Node.js) and he likes it.

Shaun is working at Worktile Inc. as the chief architect for overall design and develop worktile, a web-based collaboration and task management tool, and lesschat, a real-time communication aggregation tool.

Post Categories

.NET

The SQL Azure Federation had been publically launched several weeks ago and this is one of the most existing features I’m looking forward. This might be the first post of SQL Azure Federation, and hopefully not the last one.

Some Backgrounds

SQL Azure Federation was mentioned in about 2009. The Microsoft told that there will be a feature in SQL Azure allow users to split one database into many based on some sort of rules But from the client side perspective, user can interact their data as if in one database. This feature was called SQL Azure Federation. At the TechED China 2010, when I talked with Zach, a technical evangelist at Microsoft focus on Windows Azure platform when the SQL Azure Federation would be available, or even though CTP for MVPs or partners. But at that moment Zach said he still don’t have a chance to take a look at it. I remembered it was Dec, 2010.

But the good news came on June 2011, Microsoft had opened the nomination of the PE Program of SQL Azure Federation. And I was very lucky being approved to use this feature and provide some early feedback to Microsoft. During the PE program I had attended several online meetings and have the chance to play with it in some of my projects. Cihan Biyikoglu, the program manager in SQL Azure team and his group gave me a lot of information and suggestion on how the SQL Azure Federation works and how to use it in the best approach. During the PE program, the Microsoft said that SQL Azure Federation will be available at the end of 2011.

In 12th, Dec 2011 the SQL Azure Federation was launched with the SQL Azure Q4 2011 Service Release, with some other cool features, which you can have a reference here.

What’s (Data) Federation

Federation is not a new concept or technology in computer science. But it could be meaning differently in different context, such as WS-Federation, Active Directory Federation Services, etc. But in this, and the following blog posts about SQL Azure Federation, the word “Federation” I mentioned would be focus on the Data Federation.

Federation, as known as the data shard or partitioning, is a database design principle whereby rows of a database table are held separately. It also be known as horizontal partitioning. Assuming that we have a database and a table named Product which contains the product records, and now there’s 10,000 records there. After a while the number of the records raised to 10,000,000. If they are all in the same database it might cause performance problem. In this case, there are two solutions we can choose. One is to increase the hardware of the database server. For example, more CPU cores, more memory and higher network bandwidth. This approach we called Scale Up. But there will be a limitation in this way, we can not add cores, memory and much as we want.

Another one is to split the database across to multiple databases and servers. Let say we divide the database and Product table in 10 databases (servers), so in each database there will be only 1,000,000 records in Product table. We split the data volume across the multiple servers, as well as split the network load, CPU and memory usage. Furthermore, since we can have as many servers as we need, there will be no limitation to extend our system in this approach. This is called Scale Out.

SQL Azure Federation implemented this approach, which helps us to split one database into many that we called federation members, to increase the performance.

Horizontal Partitioning and Vertical Partitioning

Let’s deep into the tables in the databases to be federated. If a table was too big that introduced some performance issues, like the Product table I mentioned previously which has 10,000,000 records, we need to split them across to the databases. There are also two approaches we have, the horizontal partitioning and vertical partitioning.

Horizontal partitioning, likes you use a knife to cut the table horizontally, which means split the table by rows. The tables after the partitioning would be:

Have the exactly same schema.

One database command on a table would be the same on any other tables.

SQL Azure Federation utilize the horizontal partitioning to split the tables in multiple databases. But it’s not that simple as I mentioned above. When using horizontal partitioning, we need to firstly define the rule on how to divide the tables. In the picture above, it indicates that the table will be divided by ID, all records that ID less than 4 would into one database, and others (larger than 3) will be in another.

But if we have some tables related, for example UserOrder table which have UserID as well, we need to split that table by the same rule, to make sure that all records in the tables that referred to the same UserID must be in the same partition. This will make the JOIN query quick and easy.

There are also some tables that doesn’t related with the ID in this example, for instance the countries, cities, etc.. When we partitioning the database, these tables should not be split and need to be copied to each databases.

The last thing is that, there might be some tables that represent the global information, like the system settings, metadata and schema data. They should not be split and should not be copied into the databases. So they will be remained in the original database we can call it root database.

Now we have a fully implementation on the horizontal partitioning. We have the rule on how the data should be split. We ensure that all related records will be stored in the same database node and the lookup tables will copied across them. We also have the root database with tables that have the global information stored. I can tell you that this is what SQL Azure Federation does for us, automatically and safely.

Federation is the rule on how to partition our data. There can be more than one federations in one system. But on a particular table there has to be only one federation apply. This means, for example we have a table with columns UserID and ProductID. We can apply a federation that split the table by UserID, or by ProductID, but we cannot apply both of them on the same time.

A federation includes:

Federation Name: The name of the federation which can be used when alter or connect.

Federation Distribution Name: The identity name that to split the tables in this federation. For example if we want to split the tables based on the UserID then we can name the federation distribution name as “userId”, “uid” or whatever.

Federation Distribution Data Type: The data type that the federation distribution name is. Currently the SQL Azure Federation only support int, bigint, uniqueidentifier and varbinary(n).

Distribution Type: How SQL Azure Federation will split the data. There are many ways to split the data such as mod, consistent hashing but currently SQL Azure Federation only support “range”.

After we split database into many, based on the federation we specified, the small databases called Federation Member. The original database, may contains some metadata tables would be called Root Database or Federation Root. The tables that is being split into the federation members are Federated Table.

The tables that represent the lookup data can be copied to each federation members automatically by SQL Azure Federation, which is called Reference Table. The remaining tables in the federation root would be Center Tables.

As we discussed below, when horizontal partitioning the tables that related with the same split key (here is the federation distribution name) should be put into the same databases (federation members). For example if we move the record in Product table into federation member 1 if UserID = 3, then all records that UserID = 3 in the table ProductDetails should be moved in federation member 1 as well. In SQL Azure Federation, the group of the records that related to the same federation distribution value called Atomic Unit. This is very important and useful when using SQL Azure Federation which I will explain in the coming few posts.

Summary

In this post I covered some basic information about the data federation. I talked about the approaches that we can use to partitioning our data. I also described the different between horizontal partitioning and the the vertical partitioning, and the goal of horizontal partitioning. Finally I talked about the concept of SQL Azure Federation.

In the next post I will demonstrate how to create a database and use SQL Azure Federation, to split my original database into members.

Comments

#re: SQL Azure Federation – IntroductionPosted by luxury resort spas
on 1/7/2012 3:28 PM
Thanks for taking the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you gain knowledge, would you mind updating your blog with extra information? It is extremely helpful for me.

#re: SQL Azure Federation – IntroductionPosted by luxury resort spas
on 1/7/2012 3:29 PM
Excellent tips.Really useful stuff .Never had an idea about this, will look for more of such informative posts from your side.. good job...Keep it up

#re: SQL Azure Federation – IntroductionPosted by Tim
on 4/11/2012 11:01 AM
Thanks for this post! This was exactly the visual example I needed to confirm what I thought sql azure federation was.

#re: SQL Azure Federation – IntroductionPosted by Lakhwant
on 4/28/2012 12:33 PM
Thanks, For this post, I have question that if we federate tables of a existing database, will it effect the data in the existing database? and How can we continue without any data loss and without any stoppage in the process.

#re: SQL Azure Federation – IntroductionPosted by mukesh
on 9/7/2012 7:44 PM
Thanks a lot Shaun for the post. very informative keep up the good work..

#re: SQL Azure Federation – IntroductionPosted by Rajkutti
on 9/27/2012 3:09 PM
Thanks for this post. It was very useful to understand the sql azure federation fundamentals.

#re: SQL Azure Federation – IntroductionPosted by Conor Cunningham [MSFT]
on 4/10/2014 3:52 AM
Current guidance on sharding from Microsoft can be found here:http://msdn.microsoft.com/library/azure/dn495641.aspx