Editor’s Note: Today’s post, written by Full Scale 180 Principal Trent Swanson, describes how the company uses Windows Azure and database partitioning to build scalable solutions for its customers.

Full Scale 180 is a Redmond, Washington based consulting firm specializing in cloud computing solutions, providing professional services from architecture advisory to solution delivery. The Full Scale 180 team has a reputation for delivering innovative cloud solutions on Windows Azure, for seemingly impossible problems. Full Scale 180 works with customers across a broad set of industries, and although every project is very unique, there are often a lot of common concerns and requirements spanning these solutions.

Through the course of various different projects with the customers, designing, implementing, and deploying some very cool solutions on Windows Azure, the company has been presented with some very interesting challenges. A challenge we often encounter is database scaling.

As far as working with a data store is concerned, at a very high level, you need to concentrate your work in two main areas:

The “place” where data is stored

Getting the data in and out of that place in the most optimum way

The game of complexity and higher abstraction layers is an interesting one in the software development realm. You start with a means (this word is representing many different concepts here such as API, library, programming paradigm, class library, framework) for getting something done, eventually getting to an equilibrium state, either coming up with your higher level abstraction constructs, or using one from somebody else, often referred to as build/acquire decision. Like anything else, data stores follow the same pattern. When dealing with relational stores, such as SQL Azure, you need to play with the rules set by the system.

The Place Where Data is Stored

When working with SQL Azure, the physical components of the data store are no longer your concern, so you do not worry about things like data files, file groups, and disk-full conditions. You do need to consider resource limitations imposed by the service itself. Currently SQL Azure offers individual databases up to 150GB.

It’s common to expect an application’s database consumption to grow over time. Unlike on-premises database, the only dimension you can control is the procurement of additional database space from Windows Azure. There are two approaches for this: either plan for expansion and procure new space ahead of time (which may defeat the purpose on running in the cloud) or expand automatically as the need arises, based on policies. If choosing the latter, we then need to find a way to partition data across databases.

Optimum Data Transfer and Sharding

Aside from space management, we need to make sure the data coming in to and out of the data store is fast. With on-premises systems, both network and disk speed may be optimized, but on cloud platforms, this is typically not an available optimization, so a different approach is needed. This usually translates into parallelizing data access.

Data storage needs will grow, yet we need to play within the rules set by the platform for maximum database size. Likewise, we must learn to design solutions with these size and throughput limitations in mind. Whether it’s connectivity to the data store, the physical storage throughput within the data store, or size limits on a data store, there is often a need to design solutions to scale beyond these single unit limitations. If we can employ a mechanism where we utilize a collection of smaller databases to store our data, where we can potentially access these smaller databases in parallel, we can optimize our data store solution for both size and speed. The mechanism here should take care of automatic data partitioning and database procurement. One common approach to solve this is sharding. With sharding, there are changes to data management and data access, irrespective of the method used. SQL Azure Federations provide an out-of-the-box sharding implementation for SQL Azure.

During some of our customer engagements, we uncovered situations where SQL Azure Federations would be a solution. In addition to simply scaling out beyond the 150GB size limitation of a single database, we have found federations useful in multi-tenant cloud solutions.

SQL Azure Federations in Multi-Tenant Solutions

Multi-tenancy is a requirement that is often part of the cloud solutions we typically work on. These projects include building new solutions, adding this feature to existing single-tenant solutions, and even re-designing existing multi-tenant solutions to achieve increased scale and reduced operating costs. We often find SQL Azure Federations to be an extremely useful feature in meeting these requirements. A tenant becomes a natural boundary to partition data on, and with a large number of tenants, cost management becomes critical.

Let’s consider a solution that stores, at most, 100KB of tenant data, with each tenant’s data in its own database. The smallest database we can provision in SQL Azure today is 100MB, which equates to a monthly storage cost of $5/tenant. If we onboard 10,000 tenants, the baseline cost is now $50,000! Now, instead of separate databases, we could combine all tenants into a single database. Even if every tenant were to store their full 100KB of data, we could actually fit all 10,000 tenants in a 2GB database with room to spare, costing us only $13.99 monthly. That’s a big difference!

Now let’s consider the situation where we add new features to our service, requiring more database storage, while we continue to onboard tenants. We can certainly increase the database size to accommodate the increased demand, but at some point we hit a limit. This limit could either be a cap on the database size, or the number of transactions a single database is capable of processing in an acceptable time. This is where sharding becomes extremely useful, and with SQL Azure Federations it’s nice to know that at some point we can simply issue split our database while the service is still online, and scale our database out to meet growing demand.

We recently developed a number of samples demonstrating multi-tenant solutions on Windows Azure. One of these samples includes a multi-tenant sample application utilizing SQL Azure Federations and can be found at shard.codplex.com. Let’s look at an example based on the Shard project.

Adding Multi-tenancy to an Existing Solution

Moving a single-tenant database approach to a shared database design is often a very costly endeavor. A common approach is to add a tenant identifier to each table containing tenant-specific data, and then rework the application in all the layers to include tenancy. Additionally, to support scaling beyond the resource limitations of a single database, tenants must be distributed across multiple databases. In return, the solution’s operating cost is lower, thus increasing profits or a letting a software vendor price their solution more competitively. In the past, we would essentially end up with a custom sharding solution to reduce costs and support scale. These custom solutions had complex designs providing tenant-level isolation in a single DB, handling of connection routing, and moving tenants across databases to meet growing demand.

Filtered Connections

The SQL Azure Federations filtered connections feature is extremely powerful in moving existing solutions to a shared database design. Filtered connections can be used to minimize the changes necessary in the business logic or data access layer, which are commonly required to make tenant ID part of all operations. Once our database connection is initialized with the tenant context we can use the connection with the pre-existing business logic and data access layer. Although this feature has been used to minimize the amount of work necessary in the application, changes in the schema are still necessary, small changes in the data layer are also required and sometimes changes to the application may be necessary due to the use of unsupported features in federations. Details of the SQL Azure Federations Guidelines and Limitations can be found on MSDN.

Even though we would add a [TenantId] column to the schema in order to store data for multiple tenants in the same table, we don’t necessarily have to change our code or our model to handle this. For example, let’s say we have a table containing tasks, and some feature in the application that inserts and displays the list of tasks in that table for a tenant. After adding the TenantId column to the table, without filtered connections, any code containing SQL statements like the following

SELECT * FROM [Tasks]

Would need to be changed to something like:

SELECT * FROM [Tasks] WHERE TenantId = @TenantId

In fact, pretty much all code containing SQL statements like this would require changes. With filtered connections, application code using a statement like “SELECT * FROM [Tasks]” will not need to be changed.

Schema Changes

After a quick review to identify the use of unsupported features in the schema and the various workarounds, we start by identifying all the federated tables. Any table containing tenant-specific data will require a tenant id column on the table, which is used to partition our data on. In addition to that, any table that contains a foreign key constraint, which references a federated table, will also need TenantId added and also become a federated table. For example, imagine if we had an Orders table, which we decided to make a federated table. This table would have OrderId, and quite often an OrderDetails table, which would contain a foreign key constraint to the OrderId on the Orders table. OrderDetails would also need TenantId column added and the foreign key constraint would also need to include TenantId.

For each of these federated tables we would also default the tenant id to the value used in establishing the filtered connection context, so that when inserting new records the business logic or the data access layer isn’t required to pass the tenant id as part of the operation.

A [TenantId] column is added to all tables containing tenant-specific data. This column is defaulted to the federation filter value which is the value passed in the USE FEDERATION statement and part of the connection state on filtered connections. This feature allows us to INSERT data into the federated tables without having to include the [TenantId] as part of the statement. Now data access code that currently does not deal with tenancy would not need to be changed to support this new column when inserting new records. All unique and clustered indexes on a federated table must include the federated column, so we have also made this part of the primary key. The “FEDERATED ON” clause is added to make it a federated table and in this we associate the [TenantId] table column with the federation distribution key of [Tid].

Connection Context

Now that our schema has been updated, we need to address getting filtered connections in the application. The challenge here is that our database connections need to be initialized with the tenant context, which requires calling a specific federation statement (“USE FEDERATION…”), after the connection is opened and before the application uses this connection. This can be accomplished by implementing a method that takes a tenant identifier and either returns an open connection for the application to use, or a connection object with some event handler to handle this logic when the connection is opened.

Bringing it All Together

Let’s bring this all together and walk through the complete process for a simple web request on a typical multi-tenant solution. In this example we will consider how we write a new task to the federated task table and return the list of tasks for the tenant.

1) We receive a web request with our task information, the data is validated, and the tenant context for the request is established. How we resolve the tenant identifier is for another discussion, as this is something that can be handled in number of different ways. We pass the tenant identifier in to a method to retrieve a database connection initialized with the tenant context.

This can either return an open connection that has had the “USE FEDERATION TenantFederation(Tid=137) WITH RESET, FILTERING=ON” statement executed.

We can attach an event handler to the connection object to execute this statement when the connection state is changed to open

There are a number of approaches available if utilizing entity framework; such as wrapping the existing SQL provider, attaching and event handler to the connection object, or simply returning a context with the connection open and initialized.

2) The “USE FEDERATION” statement redirects the connection to the correct federation member containing data for tenant id 137. The application can then use this filtered connection exactly how it did when the database contained only one tenant’s dataINSERTINTO [Task] ([Name], …) VALUES (‘My Task’, …)

Note that there is no need to include TenantId value

3) Retrieve tasks to return with view – SELECT * FROM [Tasks]

Note that there is no need to include WHERE clause with TenantId

As our system grows and we onboard more tenants, we now have an architecture that allows us to dynamically scale the database out. We SPLIT the federation, and while the application is still online we have scaled our solution out across another database.

SQL Azure Federations in Place of Custom Sharding

Some of our customers have already implemented a custom sharding solution. If it’s working, it may seem like we shouldn’t bother changing the solution to utilize SQL Azure Federations. We still discuss SQL Azure Federations with them, as there are benefits gained through Federations:

Tenant migration. It’s sometimes difficult to predict which tenants are going to be small, medium, or large during the on-boarding process, making it difficult to deal with the changing resource needs of these tenants. Tenants may need to be moved to its own database, or an existing database may need to be split to handle the increased demand on the system. SQL Azure Federations support online database splitting.

Tenant-agnostic queries. With custom shards, the data access layer likely includes the tenant ID in its queries. With SQL Azure Federations, a connection filter provides tenant-level filtering, allowing queries to be written without the need to include tenant ID.

Database lookup. Typically, in a multi-tenant application, a master database provides a lookup index for all tenant databases (mapping tenants to either a shared database or individual databases, depending on the application’s database topology). With SQL Azure Federations, the tenant-level connection string automatically connects to the appropriate database, obviating the need for managing a master lookup database with connection strings.

Connection Pool Fragmentation. A custom sharding implementation will utilize multiple databases, hence multiple connections and connection strings to those databases. Each of those connections will result in a connection pool in the application server, often leading to issues with pool fragmentation in the application. Depending on the number of the databases required to support the solution this can lead to performance issues and sometimes the only option is a complex redesign or needing to disable connection pooling. This is not the case with SQL Azure Federations as the connection to federations are handled much differently, resulting in a single connection pool.

Summary

SQL Azure Federations should be considered and evaluated with any solution with the requirement to dynamically scale out a relational database in Windows Azure. It should definitely be considered with any multi-tenant solution, new or existing. For more information on SQL Azure Federations I would recommend starting with some of the following resources.