Most existing or custom Similarities have configuration options which
can be configured via the index settings as shown below. The index
options can be provided when creating an index or updating index
settings.

Information
based model . The algorithm is based on the concept that the information content in any symbolic distribution
sequence is primarily determined by the repetitive usage of its basic elements.
For written texts this challenge would correspond to comparing the writing styles of different authors.
This similarity has the following options:

LM
Jelinek Mercer similarity . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:

lambda

The optimal value depends on both the collection and the query. The optimal value is around 0.1
for title queries and 0.7 for long queries. Default to 0.1. When value approaches 0, documents that match more query terms will be ranked higher than those that match fewer terms.

You might have noticed that a significant part of the script depends on
statistics that are the same for every document. It is possible to make the
above slightly more efficient by providing an weight_script which will
compute the document-independent part of the score and will be available
under the weight variable. When no weight_script is provided, weight
is equal to 1. The weight_script has access to the same variables as
the script except doc since it is supposed to compute a
document-independent contribution to the score.

The below configuration will give the same tf-idf scores but is slightly
more efficient: