Elasticsearch 5.2.0 released

Today we are pleased to announce the release of Elasticsearch 5.2.0, the latest stable release, with numeric and date range fields, the cluster-allocation-explain API, keyword normalizers, and partitionable terms aggregations. It is already available for deployment on Elastic Cloud, our Elasticsearch-as-a-service platform.

Full details of the changes in this release are available in the release notes listed above, but there are a few important changes which are worth singling out:

Numeric and Date Range Fields

While numeric and date fields index a discrete number or point in time, numeric and date range fields allow you to index numeric and date ranges, such as Friday 27 January 2017, between 6pm and 8pm. Range queries can be used to search for ranges which overlap, are completely contained or which contain, or which do not overlap at all. This allows you to answer questions like "What entertainment is available on Thursday evening?", a query which was previously very difficult to construct.

Cluster Allocation Explain API

PagerDuty wakes you up at 3AM with a red cluster. Which API do you reach for to figure out why shards aren’t being allocated? Previously, you had to consult several APIs to put together the complete picture. Now, the answer is simple: the cluster allocation explain API. This API can tell you whether a shard has failed to allocate or is just waiting its turn, and why allocation failed, whether it be a corrupt shard, full disks, or bad settings. It can also tell you why a shard is assigned to a particular node, perhaps when you have tried to force relocation.

Normalize your Keywords

In 5.0.0, we separated the string field type into text (for analyzed full text) and keyword (for not-analyzed string identifiers). Fields of type text can be analyzed into individual tokens for full text search, while keywords support doc_values, used for search, aggregations, sorting, and in scripts. However, there are times when you need some of the power of the analysis chain to normalise keywords, such as lower-casing email addresses or zip codes.

This release brings normalizers to keyword fields. Normalizers are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters. Only the filters that work on a per-character basis are allowed. For instance a lowercasing filter would be allowed, but not a stemming filter, which needs to look at the keyword as a whole.

Partitioning of Terms Aggregations

The terms aggregations returns the top 10 terms by default. We are often asked "But how do I return ALL terms?". The answer up until now is "You don’t". It was simply too memory intensive to collect all the terms of a high cardinality field from all the nodes in the cluster and to reduce them to a single result set. It also defeated the purpose of the terms agg, which was designed to return the top-N counts from huge datasets at speed.

However, as we’ve seen many times over, users surprise us with the problems that they solve with Elasticsearch. A frequent request was to be able to return all terms, even if the response is not instantaneous. You can now break your terms down into partitions — the more unique terms you have the more partitions you will need. For instance, you could choose to use 20 partitions and run 20 search requests, each one requesting a single partition.