Store throttling is now automatic

Prior to 2.0.0, Elasticsearch throttled ongoing merges at fixed rate (20 MB/sec by default), but this was often far too low and would easily lead to merges falling behind and subsequent index throttling.

As of 2.0.0, Elasticsearch now uses adaptive merge IO throttling: when merges are starting to fall behind, the allowed IO rate is increased, and decreased when merges are keeping up. This means a sudden but rare large merge on a low-throughput indexing use case should not saturate all available IO on the node, minimizing impact to ongoing searches and indexing.

But for heavy indexing, the allowed merge IO will increase to match the ongoing indexing.

This is good news! It means you shouldn't need to touch any throttling settings: just use the defaults.

Multiple path.data

Using multiple IO devices (by specifying multiple path.data paths) to hold the shards on your node is useful for increasing total storage space, and improving IO performance, if that's a bottleneck for your Elasticsearch usage.

With 2.0, there is an important change in how Elasticsearch spreads the IO load across multiple paths: previously, each low-level index file was sent to the best (default: most empty) path, but now that switch is per-shard instead. When a shard is allocated to the node, the node will pick which path will hold all files for that shard.

The improves resiliency to IO device failures: if one of your IO devices crashes, now you'll only lose the shards that were on it, whereas before 2.0 you would lose any shard that had at least one file on the affected device (typically this means nearly all shards on that node).

Note that an OS-level RAID 0 device is also a poor choice as you'll lose all shards on that node when any device fails, since files are striped at the block level, so multiple path.data is the recommended approach.

You should still have at least 1 replica for your indices so you can recover the lost shards from other nodes without any data loss.

The Auto-ID optimization is removed

Previously, Elasticsearch optimized the auto-id case (when you didn't provide your own id for each indexed document) to use append-only Lucene APIs under-the-hood, but this proved problematic in error cases and so we removed it.

Finally, as of 1.4.0, Elasticsearch now uses Flake IDs to generate its IDs, for better lookup performance over the previous UUIDs.

All of this means you should feel free to use your own id field, with no performance penalty due to the now removed auto-id handling, but just remember that your choice of id field values does affect indexing performance.

If you are curious about the low-level Lucene actions while indexing, add index.engine.lucene.iw: TRACE to your logging.yml, but be warned: this generates impressively copious output!

Our nightly indexing performance charts shows that we are generally moving in the right direction with our defaults over time: the docs/sec and segment counts for the defaults (4 GB) heap has caught up to the fast performance line.