Elasticsearch 5.3.1 released

Today we are pleased to announce the bugfix release of Elasticsearch 5.3.1, based on Lucene 6.4.2. It is already available for deployment on Elastic Cloud, our Elasticsearch-as-a-service platform. All users of 5.3.0 should upgrade.

Multi Data Path Bug in Elasticsearch 5.3.0

Elasticsearch 5.3.0 contained a bug that was triggered by using default.path.data (configured by default in the RPM and Debian packages) along with an array setting for path.data. The bug resulted in the configured paths being added to the default path instead of overwriting it. This release contains a fix, as well as a check to see if there is still data sitting in the default path. See Multi data path bug in Elasticsearch 5.3.0 for more information about how to recover from this bug.

Misconfigured Shingle/CJK Filters can Cause OOM

There has been much work recently on improving Lucene’s handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Most token filters (e.g. synonyms, shingles, CJK, word-delimiter) now use graph analysis to calculate the correct order of tokens for accurate phrase queries.

However, a misconfigured token filter can generate too many paths, which can consume your entire heap space. For example, a shingle token filter with max_shingle_size and min_shingle_size set to different values or with output_unigrams set to true can easily result in an explosion of paths.

This release contains a protection mechanism to turn off graph analysis in the shingles and CJK token filters when configured to use different shingle lengths.