Customisation

Lunr ships with sensible defaults that will produce good results for most use cases. Lunr also provides the ability to customise the index to provide extra features and allow more control over how documents are indexed and scored.

Plugins

Any customisation, or extensions, can be packaged as a plugin. This makes it easier to share your customisations between indexes and other people, and provides a single, supported, way of customising Lunr.

A plugin is just a function that Lunr executes in the context of an index builder. For example, a plugin that adds some default fields to the index would look like this:

vararticleIndex=function(){this.field('text')}

This plugin can then be used when defining an index:

varidx=lunr(function(){this.use(articleIndex)})

Plugin functions have their context set to the index builder, and the builder is also passed as the first argument to the plugin. Additional parameters can also be passed to the plugin when using it in an index. For example, taking the above plugin and passing it fields to add to the index could look like this:

Pipeline Functions

The most commonly customised part of Lunr is the text processing pipeline. For example, if you wanted to support searching on either British or American spelling, you could add a pipeline function to normalise certain words. Let’s say we want to normalise the term “grey” so users can search by either British spelling “grey” or American spelling “gray”. To do this we can add a pipeline function to do the normalisation:

varnormaliseSpelling=function(builder){// Define a pipeline function that converts 'gray' to 'grey'varpipelineFunction=function(token){if(token.toString()=="gray"){returntoken.update(function(){return"grey"})}else{returntoken}}// Register the pipeline function so the index can be serialisedlunr.Pipeline.registerFunction(pipelineFunction,'normaliseSpelling')// Add the pipeline function to both the indexing pipeline and the// searching pipelinebuilder.pipeline.before(lunr.stemmer,pipelineFunction)builder.searchPipeline.before(lunr.stemmer,pipelineFunction)}

As before, this plugin can then be used in an index:

varidx=lunr(function(){this.use(normaliseSpelling)})

A pipeline is run on all fields in a document during indexing. Each token passed to the pipeline functions includes meta-data that indicates which field the token came from, this can be used to control which fields are processed by a particular pipeline function. The below example will skip stemming on tokens from the “name” field of a document.

// Define a function that will skip a pipeline function for a specified fieldvarskipField=function(fieldName,fn){returnfunction(token,i,tokens){if(token.metadata["fields"].indexOf(fieldName)>=0){returntoken}returnfn(token,i,tokens)}}// Create a stemmer that ignores tokens from the field "name"varselectiveStemmer=skipField('name',lunr.stemmer)

Token Meta-data

Pipeline functions in Lunr are able to attach metadata to a token. An example of this is the token’s position data, i.e. the location of the token in the indexed document. By default, no metadata is stored in the index; this is to reduce the size of the index. It is possible to whitelist certain token metadata. Whitelisted meta-data will be returned with search results and it can also be used by other pipeline functions.

A lunr.Token has support for adding meta-data. For example, the following plugin will attach the length of a token as meta-data with key tokenLength. For it to be available in search results, this meta-data key is also added to the meta-data whitelist:

vartokenLengthMetadata=function(builder){// Define a pipeline function that stores the token length as metadatavarpipelineFunction=function(token){token.metadata['tokenLength']=token.toString().lengthreturntoken}// Register the pipeline function so the index can be serialisedlunr.Pipeline.registerFunction(pipelineFunction,'tokenLenghtMetadata')// Add the pipeline function to the indexing pipelinebuilder.pipeline.before(lunr.stemmer,pipelineFunction)// Whitelist the tokenLength metadata keybuilder.metadataWhitelist.push('tokenLength')}

As with all plugins, using it in an index is simple:

varidx=lunr(function(){this.use(tokenLengthMetadata)})

Similarity Tuning

The algorithm used by Lunr to calculate similarity between a query and a document can be tuned using two parameters. Lunr ships with sensible defaults, and these can be adjusted to provide the best results for a given collection of documents.

b

This parameter controls the importance given to the length of a document and its fields. This value must be between 0 and 1, and by default it has a value of 0.75. Reducing this value reduces the effect of different length documents on a term’s importance to that document.

k1

This controls how quickly the boost given by a common word reaches saturation. Increasing it will slow down the rate of saturation and lower values result in quicker saturation. The default value is 1.2. If the collection of documents being indexed have high occurrences of words that are not covered by a stop word filter, these words can quickly dominate any similarity calculation. In these cases, this value can be reduced to get more balanced results.