A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away. ~Antoine de Saint-Exupery -- Note, the opinions stated here are mine alone and are not those of any past, present, or future employer. --

Sunday, August 24, 2008

Shard Lessons

No, not SHARED lessons, I mean SHARD lessons. I have to admit that until about a year ago I didn't really know the term shards in relation to databases. Now don't confuse that with not understanding how databases can be horizontally scaled. I was introduced to that concept and helped to define the various ways it can be done but we just called it splits. Regardless of what you call it, there are some interesting challenges that are introduced. The well known challenges of consistency are discussed ad nauseam, even by me, so I'm not going there with this article. But besides that, there are some other lessons to learn when applying the pattern to your data.

Lesson 1: Right Size Your Shards

Sounds like a fast food commercial when I put it in those terms but the idea is actually the same. Determining the initial number of shards can be tricky. You don't have infinite resources and no matter how good your tools are, more shards are more problematic to manage than fewer. Yet you also want to select a number that going to last you for a while. I would recommend picking a number that will give you 18-24 months of growth with a margin for safety.

Simple enough, right? Well, not quite yet. I also wouldn't start with a single digit number of shards even if that will last you for two years. Why start with 4 when the incremental overhead of 12 is manageable. You allow a longer growth path and by hosting multiple databases on the same physical machine, you limit your hardware expenses. Of course then why not 24 or more? Well, at some point you add unnecessary overheads and get diminishing returns.

Lesson 2: Use Math on Shard Counts

If you notice the numbers I selected in the previous lesson, they are multiples of 12. Why that and not multiples of 10 or 7 or just whatever number happens to be your lucky number? Well, let's say you are going to run multiple shards per physical machine. You're a small organization. You pick 10 as your number of shards and spread it on two machines. Five each, things are perfect. Now you find your growth means you need more capacity. Would be nice to add one machine, but that leaves you with a 3, 3, 4 split and one machine has 33% more load than the other two. In fact, your next physical growth is 5, which may not be terrible if you are running your databases on commodity hardware, as you should!

But had you started with 12, you find your growth options to have fewer cost step functions. You start with 2 boxes, 6 shards each. Move to 3 with 4 shards, then 4 with 3 shards, and finally 6 with 2 shards. If you had projected 12 shards would allow you to scale for 2 years, you can see how you are able to grow your hardware requirements in a smoother fashion than with 10 shards.

Lesson 3: Carefully Consider the Spread

The way you spread the data across the shards needs to be determined based on what you are hoping to optimize. Certainly the simplest strategy is a uniform distribution which can be achieved using simple modulo math on the primary access key. This approach requires very little to implement and works well when the primary goal is to evenly distribute the load and you have very few access paths.

There may be times however where you want different locality policies. For example, you may want to keep all members of a group on the same shard. This can improve performance for applications that tend to operate upon a group. It also introduces challenges in balancing loads as some groups may be larger or more active than others. The benefits of clustering by a group must be weighed against the challenges of balancing work loads.

Lesson 4: Plan for Exceeding Your Shards

No matter what number you pick, the possibility of reaching capacity of your databases is very real. You selected a shard pattern to allow for a scale out strategy. So as you reach the limits of your current hardware you really don't want to shift to s a scale up on those shards. That's why it is important to bake into your shard strategy your scale out plan.

From Lesson 3, if you are clustering by group, adding more hosts to support new groups is probably straightforward. This helps when your scaling challenge is a growth in the number of groups and not just a few groups growing larger. Shards that cluster based on a group will probably need to support a scheme to migrate groups to allow capacity to be more readily managed.

Uniform spreads can be expanded through another trick of math. Each shard can be treated as a root node in a shard tree. When it is time to grow the capacity, another layer of shards can be added to grow capacity geometrically. For simplicity, let's say you originally started with 4 shards. You calculate the shard number with the following formula:

shard = key % 4

Now you need to scale. Let's say we will expand from 4 to 16 hosts. The technique for computing the root shard and final shard is:

root = key % 4
leaf = (key / 4) % 4

You now have 4 * 4 or 16 possible shard combinations using root.leaf as the shard identifier. Furthermore migration can be done online simply by following the scheme of checking root.leaf first and if you do not find your record, check root. Appropriate locking mechanisms will obviously be required by migration scripts to insure that no data is lost or corrupted. And this technique does require a temporary increase in the number of database instances to (old shard + new shard) hosts (in this example, 20 hosts total).

Lesson 5: Shard Early and Often

No matter how disciplined you think or hope your team is, if you give them a monolithic schema, and tight deadlines, they will eventually create a mission critical query that assumes two rows are on the same physical host. And, in my experience without fail, those two rows can't possibly live on any sane shard spread strategy. So if you have any hint at all that you will need to scale out a schema, do it early. The longer you wait, the more your application will depend upon a single schema instance and the harder it will be to migrate to a shard schema.

Care to share lessons you have learned from shard patterns? I am sure there are more that are worth knowing!

Comments

* Lesson 2 : Distinguish logical sharding and physical servers as separate constructs. Design for the future with logical sharding, use physical server growth when the need arises. In other words this separation allows you to design your code from the get go, but to delay acquisition and deployment of physical servers to when it is needed without having to make changes to your code. The mapping from logical shards to physical servers is a configuration artifact. This concept does not apply just to sharding but to all database connection management. Build multi-tenancy for logical shards on a physical server. This will allow you not just to expand when necessary, but also compress when Moore's law leverage kicks in over time.

* Lesson 5 : I would shard early and shard RIGHT. Prevent any need to re-do it in the future. Data migration in large scale systems that have to be online 24 x 365 is a planning, execution and co-ordination nightmare. That said :-), "unanticipated" growth can always push you down the re-sharding route. Think of it when you are writing your architecture specification and account for it, in terms of TTL ( time-to-live ) and methodologies.

* Sharding sometimes comes with the need to create secondary data structures for alternate access paths, just like tables have indexes for lookups based on alternate keys or range scans based on non-key columns. There are a few patterns in this area. If it is a rare backend use case, you can consider a "shard scan", else, you will need to create tables that contain secondary key to "shard key" mapping.

* If the next step in Moore's law for your database hardware is right around the corner and you hardware replacement cycle matches with that in terms of timelines, consider tuning ( db, query, I/O ... ) and optimization ( query count, caching ... ) rather than re-sharding.

* Mathematical sharding algorithms allow for scale and good balance across shards but not for great flexibility in re-sharding. Lookup based sharding allow for scale and flexibility in re-sharding, but, one has to pay close attention to balancing the shards.