Apr 28, 2016

I wrote previously about using a custom scoring loop to retrieve Lucene document values. It turns out there is another simple optimization that allows much faster field value retrieval. That feature is known as DocValuesfield type. The idea is that in addition to LongField/DoubleField types that can be indexed (and so filtered on when running a Lucene query) there is another numeric field type.

The difference is that the DocValues fields cannot be indexed. So the pattern here is to have to Lucene document field subsets. The fields from the first one are indexed but not stored. They are used to match documents. The second subset consists of DocValues fields only. For a matching document the values of the DocFields can be retrieved from the index reader without fetching the entire document. The DocValues fields are implemented using memory-efficient techniques including reasonable compression.