Blog dedicated to Elasticsearch Server Books series

ElasticSearch 0.90 – Stored fields and term vectors compression

Send to Kindle

The newest release of ElasticSearch, the 0.90.0 RC1 leverages all the new features that come with Apache Lucene 4.2. The two features we will have a short look today are automatic compression of all stored fields in the index as well as term vectors compression. What does it mean for ElasticSearch ? Basically all the compression functionalities that were present in the previous ElasticSearch versions are now gone and they are turned on by default, but let’s see what it means for us.

Before 0.90

With the release of ElasticSearch 0.19 users were given the possibility to compress the _source field and this way save the space needed to store the index. ElasticSearch was responsible for doing it in the most efficient way, which meant only decompressing the field when needed. We were also given some degree of configuration possibilities to control how the compression of the _source field should be handled. And now, if you are using ElasticSearch 0.90 its all gone.

Here comes ElasticSearch 0.90

In ElasticSearch 0.90 all the options about _source field compression are gone. That means that the following configuration options are not used by ElasticSearch:

enabling and disabling _source field compression,

choosing compression method,

choosing compression threshold.

How to enable compression ?

There is nothing special you need to do to turn on both stored fields compression and term vectors compression – it is enabled by default. Is you are worried about performance compressing and decompressing stored fields, you shouldn’t. Look at what we wrote about stored fields compression in Lucene 4.1 (Solr 4.1: Stored fields compression). What you get is smaller index with minimal performance impact which is neglect-able when when taking smaller index size into consideration.

How to disable it ?

This is something more complicated, because default codec used by Apache Lucene 4.2 uses compression and can’t be configured not to use it. You would have to choose a different, custom codec or develop a new one that matches your needs and add it to configuration. You can read about how to use custom codes (and not only) in one of the previous entries (ElasticSearch 0.90 – Using codecs).

That depends on what you are using ES for. If you only want to store as much data as you can and rarely query, this is true. But for constant querying and analyzing it is not so true anymore. Elasticsearch does compress portion of the data – the stored fields, so _source will be compressed, but there is also data that is indexed and the meta data.