In Core Concepts , we mentioned the main roles you undertake building a learning to rank system. In How does the plugin fit in? we discussed at a high level what this plugin does to help you use Elasticsearch as a learning to rank system.

This section covers the functionality built into the Elasticsearch LTR plugin to build & upload features with the plugin.

Elasticsearch LTR features correspond to Elasticsearch queries. The score of an Elasticsearch query, when run using the user’s search terms (and other parameters), are the values you use in your training set.

Obvious features might include traditional search queries, like a simple “match” query on title:

{"query":{"match":{"title":"{{keywords}}"}}}

Of course, properties of documents such as popularity can also be a feature. Function score queries can help access these values. For example, to access the average user rating of a movie:

Similar to how you would develop queries like these to manually improve search relevance, the ranking function f you’re training also combines these queries mathematically to arrive at a relevance score.

You’ll notice the {{keywords}}, {{users_lat}}, and {{users_lon}} above. This syntax is the mustache templating system used in other parts of Elasticsearch. This lets you inject various query or user-specific variables into the search template. Perhaps information about the user for personalization? Or the location of the searcher’s phone?

Elasticsearch LTR gives you an interface for creating and manipulating features. Once created, then you can have access to a set of feature for logging. Logged features when combined with your judgment list, can be trained into a model. Finally, that model can then be uploaded to Elasticsearch LTR and executed as a search.

A feature store corresponds to an Elasticsearch index used to store metadata about the features and models. Typically, one feature store corresponds to a major search site/implementation. For example, wikipedia vs wikitravel

For most use cases, you can simply get by with the single, default feature store and never think about feature stores ever again. This needs to be initialized the first time you use Elasticsearch Learning to Rank:

Feature sets are where the action really happens in Elasticsearch LTR.

A feature set is a set of features that has been grouped together for logging & model evaluation. You’ll refer to feature sets when you want to log multiple feature values for offline training. You’ll also create a model from a feature set, copying the feature set into model.

When adding features, we recommend sanity checking that the features work as expected. Adding a “validation” block to your feature creation let’s Elasticsearch LTR run the query before adding it. If you don’t run this validation, you may find out only much later that the query, while valid JSON, was a malformed Elasticsearch query. You can imagine, batching dozens of features to log, only to have one of them fail in production can be quite annoying!

To run validation, you simply specify test parameters and a test index to run:

"validation":{"params":{"keywords":"rambo"},"index":"tmdb"},

Place this alongside the feature set. You’ll see below we have a malformed match query. The example below should return an error that validation failed. An indicator you should take a closer look at the query:

Of course you may not know upfront what features could be useful. You may wish to append a new feature later for logging and model evaluation. For example, creating the user_rating feature, we could create it using the feature set append API, like below:

Because some model training libraries refer to features by name, Elasticsearch LTR enforces unique names for each features. In the example above, we could not add a new user_rating feature without creating an error.

You’ll notice we appended to the feature set. Feature sets perhaps ought to be really called “lists.” Each feature has an ordinal (it’s place in the list) in addition to a name. Some LTR training applications, such as Ranklib, refer to a feature by ordinal (the “1st” feature, the “2nd” feature). Others more conveniently refer to the name. So you may need both/either. You’ll see that when features are logged, they give you a list of features back to preserve the ordinal.

Next-up, we’ll talk about some unique features the Elasticsearch LTR plugin allows with a few extra custom queries in Feature Engineering.