Today’s entry is dedicated to one type of cache in the Solr – filter
cache. I will try to explain what it does, how to configure it and how
to use it in an efficient way.

What it is used for ?

Let’s start from the inside. FilterCache stores unordered
collection of identifiers of documents. Of course, these are not the
IDs defined in the schema.xml file as a unique key – Solr stores the internal IDs of the documents used by Lucene and Solr – it is worth remembering.

What it is used for ?

The main task of the filterCache is to keep results related
to the use of filters. Although it is not his only use. In addition,
the cache can serve as an aid for faceting mechanism (if using the TermEnum method), and for sorting when <useFilterForSortedQuery/> option is set to true in the solrconfig.xml file.

Definition

class - class is responsible for implementation. For filterCache recommend using solr.FastLRUCache, which is characterized by greater efficiency in a larger number of operations GET, PUT than that.

size - the maximum number of entries that can be found in the cache.

initialSize - initial size of the cache.

autowarmCount - the number of entries that will be transcribed during the warm-up from the old to the new cache.

minSize - value specifying to which the number of entries Solr will try to reduce the cache in case of full restoration.

acceptableSize - if Solr will not be able to bring the number of entries to the specified by parameter minSize, the value acceptableSize will be the one to which it will seek a new one.

cleanupThread - the default value is false. If set to true to clean the cache will be used a separate topic.

In most cases, the use of size , and initialSize and autowarmCount parameters is quite sufficient.

How to configure ?

The size of the cache should be determined on the basis of queries that are sent to Solr. The maximum size filterCache
should be at least as large as the number of filters (with values)
that we use. This means that if your application is, in a given period
of time, using 2000 for example (fq parameters with values), the size parameter should be set to a minimum value of 2000.

Efficient use

However, the configuration of the cache is not sufficient – we need
to make the query to be able to use it. Take the following query for
example:

q=name:solr+AND+category:ksiazka+AND+section:ksiazki

At first glance, the query is the correct. However, there is a problem – it does not use filterCache. The entire request will be handled by queryResultCache and will create a single entry in it. Let’s modify it a bit and send the following query.

q=name:solr&fq=category:ksiazka&fq=section:ksiazki

What happens now? As in the previous case, an entry will be created in queryResultCache. Additionaly there will be two entries in filterCache created. Now let’s look at the next query:

q=name:lucene&fq=category:ksiazka&fq=section:ksiazki

This query would create another entry in the queryResultCache and would use two already existing entries in the filterCache. Thus the execution time of the query would be reduced and the query would be less demanding for the I/O.

However, let’s look at the query in the following form:

q=name:lucene+AND+category:ksiazka+AND+section:ksiazki

Solr would not be able to use any information from the cache and would have to collect all the information for the results of the Lucene index.

Last few words

As you can see, the correct way to configure cache is not what
guarantee that Solr will be able to use it. The efficiency of the
target implementation depends on how the queries are send to Solr. It
is worth remembering when planning implementation.

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.