How to Choose a Shard Key: The Card Game

Choosing the right shard key for a MongoDB cluster is critical: a bad choice will make you and your application miserable. Shard Carks is a cooperative strategy game to help you choose a shard key. You can try out a few pre-baked strategies I set up online (I recommend reading this post first, though). Also, this won’t make much sense unless you understand the basics of sharding.

Mapping from MongoDB to Shard Carks

This game maps the pieces of a MongoDB cluster to the game “pieces:”

Shard – a player.

Some data – a playing card. In this example, one card is ~12 hours worth of data.

Application server – the dealer: passes out cards (data) to the players (shards).

Chunk – 0-4 cards defined by a range of cards it can contain, “owned” by a single player. Each player can have multiple chunks and pass chunks to other players.

Instructions

Before play begins, the dealer orders the deck to mimic the application being modeled. For this example, we’ll pretend we’re programming a news application, where users are mostly concerned with the latest few weeks of information. Since the data is “ascending” through time, it can be modeled by sorting the cards in ascending order: two through ace for spades, then two through ace of hearts, then diamonds, then clubs for the first deck. Multiple decks can be used to model longer periods of time.

Once the decks are prepared, the players decide on a shard key: the criteria used for chunk ranges. The shard key can be any deterministic strategy that an independent observer could calculate. Some examples: order dealt, suit, or deck number.

Gameplay

The game begins with Player 1 having a single chunk (chunk1). chunk1 has 0 cards and the shard key range [-∞, ∞).

Each turn, the dealer flips over a card, computes the value for the shard key, figures out which player has a chunk range containing that key, and hands the card to that player. Because the first card’s shard key value will obviously fall in the range [-∞, ∞), it will go to Player 1, who will add it to chunk1. The second and the third cards go to chunk1, too. When the fourth card goes to chunk1, the chunk is full (chunks can only hold up to four cards) so the player splits it into two chunks: one with a range of [-∞, midchunk1), the other with a range of [midchunk1, ∞), where midchunk1 is the midpoint shard key value for the cards in chunk1, such that two cards will end up in one chunk and two cards will end up in the other.

The dealer flips over the next card and computes the shard key’s value. If it’s in the [-∞, midchunk1) range, it will be added to that chunk. If it’s in the [midchunk1, ∞) range, it will be added to that chunk.

Splitting

Whenever a chunk gets four cards, the player splits the chunk into two 2-card chunks. If a chunk has the range [x, z), it can be split into two chunks with ranges [x, y), [y, z), where x < y < z.

Balancing

All of the players should have roughly the same number of chunks at all times. If, after splitting, Player A ends up with more chunks than Player B, Player A should pass one of their chunks to Player B.

Strategy

The goals of the game are for no player to be overwhelmed and for the gameplay to remain easy indefinitely, even if more players and dealers are added. For this to be possible, the players have to choose a good shard key. There are a few different strategies people usually try:

Sample Strategy 1: Let George Do It

The players huddle together and come up with a plan: they’ll choose “order dealt” as the shard key.

The dealer starts dealing out cards: 2 of spades, 3 of spades, 4 of spades, etc. This works for a bit, but all of the cards are going to one player (he has the [x, ∞) chunk, and each card’s shard key value is closer to ∞ than the preceding card’s). He’s filling up chunks and passing them to his friends like mad, but all of the incoming cards are added to this single chunk. Add a few more dealers and the situation becomes completely unmaintainable.

When George falls over dead from exhaustion, the players regroup and realize they have to come up with a different strategy. “What if we go the other direction?” suggests one player. “Let’s have the shard key be the MD5 hash of the order dealt, so it’ll be totally random.” The players agree to give it a try.

The dealer begins calculating MD5 hashes with his calculator watch. This works great at first, at least compared to the last method. The cards are dealt at a pretty much even rate to all of the players. Unfortunately, once each player has a few dozen chunks in front of them, things start to get difficult. The dealer is handing out cards at a swift pace and the players are scrambling to find the right chunk every time the dealer hands them a card. The players realize that this strategy is just going to get more unmanageable as the number of chunks grows.

Sharding keys equivalent to this strategy: MD5 hashes, UUIDs. If you shard on a random key, you lose data locality benefits.

Sample Strategy 3: Combo Plate

What the players really want is something where they can take advantage of the order (like the first strategy) and distribute load across all of the players (like the second strategy). They figure out a trick: couple a coarsely-grained order with the random element. “Let’s say everything in a given deck is ‘equal,'” one player suggests. “If all of the cards in a deck are equivalent, we’ll need a way of splitting chunks, so we’ll also use the MD5 hash as a secondary criteria.”

The dealer passes the first four cards to Player 1. Player 1 splits his chunk and passes the new chunk to Player 2. Now the cards are being evenly distributed between Player 1 and Player 2. When one of them gets a full chunk again, they split it and hand a chunk to Player 3. After a few more cards, the dealer will be evenly distributing cards among all of the players because within a given deck, the order the players are getting the cards is random. Because the cards are being split in roughly ascending order, once a deck has finished, the players can put aside those cards and know that they’ll never have to pick them up again.

This strategy manages to both distribute load evenly and take advantage of the natural order of the data.

Applying Strategy 3 to MongoDB

For many applications where the data is roughly chronological, a good shard key is:

{<coarse timestamp> : 1, <search criteria> : 1}

The coarseness of the timestamp depends on your data: you want a bunch of chunks (a chunk is 200MB) to fit into one “unit” of timestamp. So, if 1 month’s worth of writes is 30GB, 1 month is a good granularity and your shard key could start with {"month" : 1}. If one month’s worth of data is 1 GB you might want to use the year as your timestamp. If you’re getting 500GB/month, a day would work better. If you’re inserting 5000GB/sec, sub-second timestamps would qualify as “coarse.”

If you only use a coarse granularity, you’ll end up with giant unsplittable chunks. For example, say you chose the shard key {"year" : 1}. You’d get one chunk per year, because MongoDB wouldn’t be able to split chunks based on any other criteria. So you need another field to target queries and prevent chunks from getting too big. This field shouldn’t really be random, as in Strategy 3, though. It’s good to group data by the criteria you’ll be looking for it by, so a good choice might be username, log level, or email, depending on your application.

Warning: this pattern is not the only way to choose a shard key and it won’t work well for all applications. Spend some time monitoring, figuring out what your application’s write and read patterns are. Setting up a distributed system is hard and should not be taken lightly.

How to Use Shard Carks

If you’re going to be sharding and you’re not sure what shard key to choose, try running through a few Shark Carks scenarios with coworkers. If a certain player start getting grouchy because they’re having to do twice the work or everyone is flailing around trying to find the right cards, take note and rethink your strategy. Your servers will be just as grumpy and flailing, only at 3am.

If you don’t have anyone easily persuadable around, I made a little web application for people to try out the strategies mentioned above. The source is written in PHP and available on Github, so feel free to modify. (Contribute back if you come up with something cool!)

BTW, I think you’re messing up the splitting description. I think it should be (x,y-1), (y, z). Also (-inf, midchunk1-1), etc.. See the mongodb manual section on min and max Query Specifiers:http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
“The min() value is included in the range and the max() value is excluded.”

You’re right, I made the min key exclusive and the max key inclusive. I’ll fix that.

I’m using “[” and “]” for inclusive bounds and “(” and “)” for exclusive bounds, so (x,y] is everything greater than x and less than or equal to y (but it should be [x, y), as you noticed). Also, [x,x] is a valid range. y-1 doesn’t always work (y could be a floating point number). I think that covers it.

Also, is it true to say that the key too should not be a simply incrementing field? Seems so to me. IOW, don’t use an auto increment primary key for that portion of the shard key, even if it’s a common search criteria.

Using an incrementing value for the search criteria key would either bring you back to the strategy 1 case (if it’s a fine-grained timestamp) or not give any variation in value to split chunks on (if it’s too coarse). This first draft of this post actually called the second field the “non-ascending field,” but I decided “search criteria” was a more useful description.

I also read your “Scaling MongoDB” book and I don’t understand why the 3rd strategy is a good strategy.

For example, when I do moves till the 5 deck, I see that all the writes are done to the Player 1 and the latest data inserted is in that player. The old data is centralized in one or two shards. So, if you query for the recent data, the query hit only the first Player… so, this strategy distributes the load but is not efficient for writting and reading I think.

The third strategy doesn’t concentrates writes on a single player, because chunks are not “ordered” on shards.

For example, say we had a roughly ascending shard key ({month:1, random:1}) and we labeled the chunks in order: 1, 2, 3, …, 20. If we had shards A, B, and C, we might have chunks 1, 4, 8, 10, 13, 18, 20 on shard A, chunks 2, 3, 6, 9, 12, 17 on shard B, and chunks 5, 7, 11, 14, 15, 16, 19 on shard C. Suppose 4 chunks have been created in the last month, then requests should be evenly balanced between chunks 17, 18, 19, and 20, i.e., hit shards A, B, and C. Does that make sense?

I’m going to be expanding the “Choosing a Shard Key” section of Scaling MongoDB, I’ve heard comments from other users, too, that it should explain more and give more examples.

But…what you say has sense but the card game does not act as you predice. I suppose that deck = month, our roughly ascending key. Deck: 1 act as you say, chunks are distributed along the players, but since deck : 2 all write are going to Player 1.

This is sort of where the analogy breaks down, but it works when you’re actually sharding. With real sharding, say you have a month as your granularity. For the current month, the range is {month: [this month, ∞), random : [x, y)} for chunks 17, 18, 19, and 20 (using the example in my previous comment). So when you start getting values for the next month, they’ll still use chunks 17, 18, 19, and 20 until those chunks are big enough to split… but they should still be loaded evenly, split evenly, and continue to distribute the load across the servers.

Ok…I trust you 🙂
Last question: our application is a monitoring application and data structure is like
{serverId, timestamp, values : {1….N}}
Finally I think we will chose as a shard key {{serverid: 1}, {timestamp:1}}. Do you think that its behaviour could be the same as the 3rd strategy?

I don’t want you to trust me, I want it to make sense 🙂 Maybe I’ll do another blog post on that.

Your suggestion isn’t exactly the same as the third strategy, but tends to be a good shard key for analytics. Check out Eliot’s presentations on sharding if you’d like some more examples: http://www.10gen.com/video

Hi, sorry for jumping in in the middle of the reply thread (a year later).

You say that the range for the current month will be {month: [this month, ∞), random : [x, y)} for multiple chunks. But from experience with the chunk ranges, I’ve seen that only the last chunk have $maxKey as the upper bound on month. All other chunks have ranges with fixed values for the month.

This causes all objects for the next month to be inserted into the last chunk since that is the only chunk . In combination with the behavior of balancing which moves all the lowest chunks out of the shard, this will result in that all new chunks will reside on the shard that had the last chunk.

This is also consistent with how the card game behaves.

Maybe you could shed some light on why I only have one chunk containing the $maxKey upper bound.

Arg, I’m sorry, I totally meant to get back to @google-a329719940787f66dc100e6944e4f3c2:disqus, but I never got around to it. I was wrong, you’ll always start the month with one hot chunk with this type of shard key. Generally what people do is pre-split a bunch of chunks before the month rolls over, so that the beginning of the month won’t cause hotspots. You can make a cron job to do this automatically at the end of every month: split the $maxKey chunk into several chunks for the next month and distribute them to various shards. As the new chunks are empty until the month changes, this is a pretty lightweight operation.

@kristina1:disqus want to know if using shard and replica tech , how affects to php client side, at present i am develop a e-commerce project with php framework yii. could you give me some advice ? thanks ,

Sharding shouldn’t affect the client side at all, you can treat a cluster the same way you’d treat a single server… mostly. Check out the sharding documentation (http://www.mongodb.org/display/DOCS/Sharding) for more info.

In method 3 what do you mean by data locality benefits? Lets say I have a collection of consisting of a category_id, and created_at, and I only ever want to query by created_at is there any benefit to having a compound shard key?

If you only have an index on created_at, all writes will probably go to one shard (see method 1). Sharding by (category_id, created_at) will split up writes across shards but still organize data by created_at (which means more relevant stuff will probably be in memory vs. on disk).

I have very limited experience with mongo, and from description above I not understand why method #2 (random key) is worse than #3. The disadvantages listed are:

– “the players are scrambling to find the right chunk
every time the dealer hands them a card.” In reality search for correct chunk is probably a b-tree search and it should be fast. With any shard key mongo must find the right chunk, so it has to do that b-tree search anyway. Yes, the more chunks the slower search will work, but it’s log(n). With increasing key value one can optimize search to start not at the top of the tree but near last insertion point. Is this the idea?

– “The players realize that this
strategy is just going to get more unmanageable as the number of chunks
grows.” Number of chunks is defined by amount of data. As soon as chunk is above 64MB (default size), it will be split (if it still has the range to split), so number of chunks with “semi-random” strategy 3 can also grow big.

It seems to me that the best strategy is to start with key value in the middle of the key range, then chunk will spit more evenly. I understand it’s not always possible, but random seems to be close to this that semi-ordered keys.