December 28, 2008

How to Organize a Database Table’s Keys for Scalability

I recently received a great comment, in the form of a question, on the Scalability Strategies Primer: Database Sharding post. I thought the answer to Yuriy’s comment deserved its own post so that I could get as detailed as possible in discussing what is in actuality a question with multiple valid answers. I also believe Yuriy’s question is one that is fairly commonly asked and not always clearly explained, or at least overlooked in respect to some of scalability’s more obvious issues.

The question asked was:

“… Let's say I have shards that store proverbial invoices. Since each shard is a stand-alone db instance, it will have "invoices" table with auto_incremental PK. Each shard auto increments the PK value on its own and so values are not unique among shards. In such a case I cannot expose an invoice by its id. www.example.com/invoices/123 could return multiple invoices from multiple shards. Am I missing anything here? ...”

The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards.

There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.

Multiple Column Keys

A simple method of guaranteeing uniqueness between shards is to include a shard identifier within the key. This is accomplished via a normal multiple column key. To effectively lookup keys, you will need to use a Master Index Lookup, as previously discussed here. For example, using the invoice scenario as an example and assuming we’re using MySQL for our RDBMS, we might specify the Invoice table as follows:

Pay special attention to that the “PRIMARY KEY (‘shardId’,’invoiceId’),” line in the above CREATE TABLE statement. The shardId and invoiceId columns combine to create the table’s key. This method of setting our key enables us to continue to use auto-incrementing unique columns the way we’re used to using them in a single server instance, in that the only necessary change being made to the table, in preparation for sharding, is the addition of the shardId column to the table’s key. This method’s primary positive is that it is intuitively simple.

Referring to a invoice requires a shardId and an invoiceId in this scenario because it is possible, even likely, that there will be invoiceId’s with the same number on different shards, each of which reflect different invoices. Including the shardId in a lookup is necessary to discern different invoices with matching invoiceId’s.

So as to not go too far off the mark in relation to the original question, here is an example of what a REST or MVC request might look like for viewing an invoice:

www.example.com/invoices/1/123 or www.example.com/invoices/[shardId]/[invoiceId]

Essentially, an invoice is now referenced by two numbers instead of just one. We could even combine them and add a hyphen to create a simple, more intuitive, form of the numbers to use. For example:

www.example.com/invoices/1-123 or www.example.com/invoices/[shardId]-[invoiceId]

Note, however, that an invoice’s key is transitory in that if you move an invoice to another shard as the result of a hotspot or necessary reorganization of data, the invoice key will change. If this is a problem, it may be wise to keep a history of moved data or even use one of the alternate keying methods below.

Also, moving data requires a re-idenification process to occur on each row that is moved off its origin shard. Because each invoice key includes a shardId, when an invoice is moved, the shardId value must change. In addition to the shardId changing, the invoiceId must also change because each invoiceId is unique only to the current shard it resides on as a result of the auto-incrementing invoiceId values. Also, please note that changing these key values requires updating them on any other data that references a moved invoice by its key. Automating this process with a script or small application is highly recommended.

Globally Unique Identifier Keys

Another fairly simple method of guaranteeing uniqueness between shards is to use a GUID or Globally Unique Identifier as your table row’s key. This is often accomplished by generating a GUID within the application layer, and applying it to a row, thereby guaranteeing uniqueness. To effectively lookup keys, you will need to use a Master Index Lookup, as previously discussed here.

Using the invoice scenario as an example, and assuming we’re using MySQL for our RDBMS, we might specify the Invoice table as follows:

In the CREATE TABLE statement above, we’ve set our primary key, invoiceId, to be a varchar(36). Within that varchar(36), we can simply store a 36 character GUID. (A quick note about optimizing the storage of GUIDs: you can remove hyphens and compress a GUID to something along the lines of a binary(12 or 16) without much difficulty. The only side effect is that it requires you to code encoding and decoding capabilities into your application so that you can easily work with the compressed GUIDs – I might write more about this and provide a code sample at some later time).

This method changes the identifier of an invoice into a less intuitive format. GUID’s aren’t pretty, and are hard to people to remember and refer to, given their length and randomness. That being said, we’ve simply solved many of the issues involved in scaling a dataset.

In relation to the original question, here is an example of what a REST or MVC request might look like for viewing an invoice:

www.example.com/invoices/03c7408e-0936-4e19-8dc7-323613e8618c or www.example.com/invoices/[invoiceId]

In this case, we are granted the ability to refer to an invoice using one number, which should in theory never change, no matter what shard it is located on or how many times it has been moved. An added bonus to using a GUID as a unique key is that if our dataset needs to scale beyond what our original intentions were, we are using a key that is globally unique. This means that we can move data out of our cluster and onto another cluster, or split our original shard cluster into many geographically diverse clusters and never have to change our invoice identifiers.Moving data is especially simple using this method of guaranteeing uniqueness. We don’t have to re-identify the data when moving it to another shard because the shard identifier isn’t part of the invoice table, nor is the invoiceId itself unique only to the current shard it resides on.

Auto-Increment Offset and Increment

I’ve left this method of guaranteeing uniqueness across shards for last on purpose. The usage of this method is RDBMS vendor specific, and not all RDBMS’ support this functionality out of the box. For ones that don’t, it is my recommendation that you use one of the alternative methods discussed above.

Essentially, this method grants us the ability to use the normally intuitive auto-increment property on our invoice key column, without having to add any further shard specific data to our rows. This is accomplished by using an auto-increment offset and an auto-increment increment. To effectively lookup keys, this method requires that you use an Algorithmic Lookup (hashing the key against each shard’s auto-increment offset and increment values), as previously discussed here.

An auto-increment offset value determines the starting point for an auto-incrementing row value. For example, if you set an auto-increment offset value to 3, an auto-incrementing column would assign row values starting at 3 (i.e. 3, 4, 5…) instead of the default 1 (i.e. 1, 2, 3…).

An auto-increment increment controls the interval between successive auto-incrementing row values. For example, if you set an auto-increment increment offset value to 3, an auto-incrementing column would assign row values in intervals of 3 (i.e. 1, 4, 7…) instead of the default 1 (i.e. 1, 2, 3…).

By combining the usage of an auto-increment offset and increment together, we can create uniqueness across our shard cluster by assigning varied values for each. I’ll provide examples for this method using MySQL 5.0+ as it is an RDBMS that supports this functionality.

In the CREATE TABLE statement above, we’ve set our primary key, invoiceId, to be a normal int(11) AUTO_INCREMENT column. You’ll notice that this method preserves the simplicity of a single integer invoice identifier, commonly used throughout invoicing systems for its easy to remember and work with format.

In order for us to make use of this method, we need to set MySQL’s AUTO_INCREMENT_INCREMENT and AUTO_INCREMENT_OFFSET values in the my.cnf file. Let’s assume that we have four shards in our shard cluster. We would set the following values on the following shards:

Shard 1AUTO_INCREMENT_OFFSET = 1AUTO_INCREMENT_INCREMENT = 4

Shard 2AUTO_INCREMENT_OFFSET = 2AUTO_INCREMENT_INCREMENT = 4

Shard 3AUTO_INCREMENT_OFFSET = 3AUTO_INCREMENT_INCREMENT = 4

Shard 4AUTO_INCREMENT_OFFSET = 4AUTO_INCREMENT_INCREMENT = 4

The following is what would transpire in the event of multiple rows being inserted into each shard:

Shard 1invoiceId = 1invoiceId = 5

Shard 2invoiceId = 2invoiceId = 6

Shard 3invoiceId = 3invoiceId = 7

Shard 4invoiceid = 4invoiceId = 8

As you can see, no shard has an invoiceId value the same as any other shard. Setting the AUTO_INCREMENT_OFFSET and AUTO_INCREMENT_INCREMENT values, appropriately, guarantees us non-collision of invoiceId values and therefore uniqueness of key data.

In relation to the original question, here is an example of what a REST or MVC request might look like for viewing an invoice:

www.example.com/invoices/12345 or www.example.com/invoices/[invoiceId]

In this case, we are granted the ability to refer to an invoice using one number that is as elegant and short as possible. Note, that as in the Multiple Column Keys method, an invoice’s key is transitory in that if you move an invoice to another shard as the result of a hotspot or necessary reorganization of data, the invoice key will change. If this is a problem, it may be wise to keep a history of moved data or use the Globally Unique Identifier Keys method instead.

Also, moving data requires a re-idenification process to occur on each row that is moved off its origin shard. Because each invoice key includes an auto-incrementing invoiceId that is incremented depending on the shard’s AUTO_INCREMENT_OFFSET and AUTO_INCREMENT_INCREMENT values, when an invoice is moved, those values will change, and so must the invoiceId. Also, again, please note that changing these key values requires updating them on any other data that references a moved invoice by its key. Automating this process with a script or small application is the way to go here too.

Wrap-Up

This post covered some of the more common techniques used to guarantee uniqueness of a dataset dimension’s keys across a shard cluster, and in some cases globally. Each technique’s usefulness is entirely dependent on the specific characteristics of your dataset, and it’s ultimately up to you to choose the most appropriate solution. I hope I’ve successfully outlined the basics of these techniques in a way that makes that choice fairly straightforward.

I tend to use the Globally Unique Identifier Keys method over the others when I can, as it offers the greatest amount of flexibility.

You're correct about the Auto-Increment Offset and Increment method. The single biggest issue is that it is fairly inflexible and requires a re-balancing of the sharded data or the sharded data's keys. One way to work around this is to create large number of sub-shards on each actual shard.

For example, we might predefine 100 databases, each working as if they are their own shard, regardless of the actual machine they are currently on. If we have, say 10 servers, and 100 shards, we would put 10 shards on each server. As necessary, we would add another couple of servers and redivide the shards, splitting 100 shards among 12 servers. The idea here is to predefine as many shards as you think you might ever need. You can predefine 1000 of them if you think that is what you need, or even 10000.

Just to be clear, the point of predefining a large number of shards is that it is much easier to rebalance servers by moving entire database instances (shards) than to rebalance the sharded data itself. I'm not sure that was made clear in my last comment.

Remember if your going to be using GUIDs as unique identifiers and indexing on them you should use COMB GUIDs. They will prevent the fragmentation that comes with naturally generated GUIDs when indexed.

Excellent blog. I found you through the (also excellent) High Scalability blog, and have added you to my reader.

In my experience, strategy #1 is accomplished by using the "primary" primary key that determines the shard, and not the shard ID directly.

For example, a user-centric service that shards by users would use the user Id. Since you always have to query the index table (user_lookup, in your example) anyway, this adds no overhead other than a larger primary key in the sharded tables.

It also has the benefit that you can make your URLs a bit more friendly, as in

www.example.com/invoices/mindelicato/123

This way, moving data between shards is completely invisible and cost effective.

I've also seen the distributed "hi-lo" strategy employed, where the index node also contains a table that stores the next available primary key. The application nodes reserve IDs in batches, cache them, and hand them out as new records are created.

The simplest answer is to include whatever is the basis for the logical segmentation of your shards as part of your lookups. For example, you can choose to shard by customers, with a million customers on each shard, each with the invoices belonging to those customers. To look up an invoice all you need is a customer ID and an invoice ID, the combination of which is unique across all your shards. You can have a master lookup table for determining which customers are on which shards, or use a simpler, yet ultimately less flexible technique based on a hash of the customer ID or whatever.