This Week in Elasticsearch and Apache Lucene - 2017-01-16

Welcome to
This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Elasticsearch Core

Multi-word synonyms and synonym graphs

There has been much work recently on improving Lucene's handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Multi-word synonyms do this and have long been buggy when used with proximity queries but thanks to the recent addition of SynonymGraphFilteras well as improvements to Lucene's query parsers to translate the token graph into separate queries, such analysis chains are finally handled correctly at search time. WordDelimiterFilter is also being fixed to produce correct graphs. These changes have already been exposed in Elasticsearch, and then subsequently in Lucene, thanks to Matt Weber. Graph token streams still present challenges, though, such as the need to use FlattenGraphFilter during indexing, but not searching, since a Lucene index cannot represent a graph. There are also a number of token filters that should produce a graph but do not yet, such as ShingleTokenFilter, EdgeNGramTokenFilter and decompounders.