Clustered Elasticsearch - Indexing, Shard and Replica Best Practices

By: Steve Croce

Posted on: November 27, 2017

Some of the most common sources of support tickets we see on the ObjectRocket for Elasticsearch platform are related to indexing, shard count, and replication decisions. Elasticsearch is awesome at spreading data across your cluster with the default settings, but once your cluster begins to grow, the defaults can get you in trouble. Let’s go over some of the basics of sharding and provide some best practices for indexing and shard count.

An Intro to Elasticsearch Sharding

There are tons of docs out there about how shards in Elasticsearch work, but the basic concept of sharding is splitting up the your data into a number of chunks so that searches can operate on multiple parts in parallel. In order to facilitate clustering and parallelization of index functions, each index in your Elasticsearch instance is sliced up into some number slices; These slices are called shards. Let's look at some of the key behaviors for shards:

Each shard is replicated based on the number_of_replicas setting for the index. E.g. For a number_of_replicas setting of one, there will two copies of each shard: one primary shard + one replica shard. The primary shard is the main shard and used for indexing/write and search/read operations, while the replicas are used only for search/read operations and for recovery if a primary fails.

Replica shards must reside on a different host than their primary

By default shards are automatically spread across the number of hosts in the cluster, but multiple primary shards can be placed on the same physical host. There are a number of Elasticsearch settings to modify this behavior (e.g. rebalancing, where shards are allocated, etc.), but they're beyond the scope of this blog.

Shards can not be further divided. Each individual shard must reside on only one host.

The number of shards that an index creates can be set during index creation or a global default can be used. Once the index is created, the number of shards cannot be changed without reindexing.

The number of replicas that an index has can be set either during index creation or a global default can be used. This CAN be changed after the index is created.

Now that we've set some ground rules, let's look at a small example. I've got an index created with a shard count of three and a replica setting of one. As you can see in the diagram above, Elasticsearch will create 6 shards for you: Three primary shards (Ap,Bp, and Cp above), and three replica shards (Ar, Br, and Cr).

Elasticsearch will ensure that the replicas and primaries will be placed on physically different hosts, but multiple primary shards can and will be allocated to the same host. Now that we're talking hosts, let's dive into how shards are allocated to your hosts.

Shard Allocation and Clustered Elasticsearch

As mentioned above, by default, Elasticsearch will attempt to allocate shards across all available hosts. At ObjectRocket, each cluster is made up of master nodes, client nodes, and data nodes. It's the data nodes in our architecture that form the "buckets" that the shards can be assigned to.

Using our example above, let's take those six shards and assign them to an ObjectRocket for Elasticsearch cluster with 2 data nodes (the minimum). In the diagram below, you can see that for each shard, the primary will land on one data node, while the replica is guaranteed to be on the other node. Keep in mind that the examples here show just one possible allocation, the only thing that's definite is that a replica will always be placed on a different data node than its primary.

Now, let's extend this example and add a third data node. What you see is that two shards will be moved to the new data node. You're now left with 2 shards on each node.

Finally, let's add a new index to this cluster with a a shard count of two and the number of replicas set to two. What you're left with is two new primaries (Xp and Yp) and four replicas (Xr0, Xr1, Yr0, Yr1), that could be spread across the cluster as seen in the picture.

That's it. Elasticsearch does all of the hard work for you, but there are some pitfalls to look out for.

Pitfall #1 - Massive Indexes and Massive Shards

The most common and easiest to mitigate issue in Elasticsearch is a massive index with massive shards. We see it very often that a user starts out with a very manageable single index and as their application grows, so does their index. This then leads to huge shards, because shard size is directly related to the amount of data in the cluster

The first issue this causes is poor efficiency in cluster utilization. As the shards get larger and larger, they get harder to place on a data node, since you'll need a large block of free space on a data node to place a shard there. This leads to nodes with a lot of unused space. For example, if I have 8GB data nodes, but each shard is 6GB, I'll be stranding 2GB on each of my data nodes.

The second issue is "hot spotting". If your data is consolidated into few shards than complex queries will not have the opportunity of being split across a larger number of nodes and executing in parallel.

Don't be stingy with indexes

The first and easiest solution is to use multiple indexes. Spreading your data across multiple indexes will increase the number of shards in the cluster and help spread the data a little more evenly. In addition to just an easier game of "Tetris" when Elasticsearch places shards, multiple indexes are easier to curate. Also the alias capabilities in Elasticsearch can still make multiple instances look like a single index to your app.

Most of the Elastic Stack will create daily indexes by default, which is a good practice. You can then use aliases to limit the scope of searches to specific date ranges, curator to remove old indexes as they age, and modify index settings as your data grows without having to reindex the old data.

Increase shard count as your index size increases

In addition to adding indexes more frequently, you can also increase the shard count as your index sizes start to increase. Once you see shard sizes starting to get a little too large, you can update your index template (or whatever you use to create new indexes) to use more shards for each index. However, this only helps if you're regularly creating new indexes, which is why this recommendation is listed second. Otherwise, you'll have to reindex to modify shard count, which is not impossible, but a little more work than managing multiple indexes.

Our rule of thumb here is if a shard is larger than 40% of the size of a data node, that shard is probably too big. In this case, we recommend reindexing to an index with more shards, or moving up to a larger plan size (more capacity per data node).

Pitfall #2 - Too many indexes/shards

The inverse is far too many indexes or shards. After reading the previous section, you may just say "Fine, I'll just put every doc in its own index and create a million shards". The problem there is that indexes and shards have overhead. That overhead manifests itself in storage/memory resources as well as in processing performance.

Since the cluster must maintain the state of all shards and where they're located, a massive number of shards becomes a larger bookkeeping operation which will have an impact on memory usage. Also, since queries will need to be split more ways, there will be a lot more time spent in scatter/gather for queries.

This one is a little harder to give exact guidance on, since it is highly dependent on the size of the cluster, use case, and a few other factors, but in general we can mitigate this with a few recommendations.

Shards should be no larger than 50GB

In general, 25GB is what we target large shards, but 50GB is where we have the conversation with our customers about reindexing. This has as much to do with the performance of the shard itself as it does with the process of moving that shard when you need to. Whenever rebalancing, shards may need to be moved to a different node in the cluster. Moving 50GB of data can take a significant amount of time and then you've got that capacity tied up on two nodes during that entire process.

Keep shard size less than 40% of data node size

As mentioned above, the second metric we look at for shard size is what percentage of the data node capacity a shard takes up. On the ObjectRocket service, we offer different plan sizes that are related to the amount of storage on the data nodes. We try to size the cluster and the shards to ensure that each of the largest shards don't take up more than 40% of a data node's capacity. In a cluster with a number of indexes at a mix of sizes, this is fairly effective, but in a cluster with a single or very few indexes that are very large, we are even more aggressive and try to keep this below 30%.

The idea here is to make sure that you're not stranding capacity on a data node. If you're shards are 45% the size of the data node, for example, you'll need a data node at roughly half utilization to be able to place that shard. That's a lot of spare capacity to leave lying around!

Conclusion

Selecting the right shard and indexing settings can be a moving target, but by planning ahead, making some good decisions up front and tuning as you go, you can keep your cluster healthy and running optimally.