Scripted Sorting In Elasticsearch

Elasticsearch ships with a simple, but powerful scripting language called Painless. And even if the origin of the name is not related to the pain of hacking C++ code in vim, it is really fun to work with it. Scripting can be used to customize Elasticsearch in many places. Sorting is one of it.

Default sorting – implicit or explicit

Every time you query an index, Elasticsearch returns a sorted list of matching results. If no explicit sorting is specified, the results are ordered by their score (that expresses how good they match your query). That’s even valid if you don’t use any query clause and perform just a GET request on the _search endpoint of your index. In this case every document will get the score 1, which means that the result is not ordered. The same is true if your query contains only filter elements (binary conditions that decide if a document matches or not).

You can overwrite that default behavior by adding one or more sort conditions to your query. That works exactly like the sorting in a database or your spreadsheet. You specify an attribute to sort by and the direction: ascending or descending. Consider the following three example documents. You can copy and paste the sample code directly into the Kibana Dev Tool console.

Simple sorting by an attribute is as easy as you expect. If the attribute is a string, the lexicographical order is used what means that the string abc is ordered before bcd. It’s the same that you know from a dictionary or a telephone book.

Customize sorting with a script

Default sorting options may sufficient in many cases. Besides the examples above, Elasticsearch ships with a couple of additional sorting options, e.g. the possibility to sort by geo distance. But sometimes, it would be helpful to have more control of the sorting behavior. That’s where scripting comes in play.

Imagine that you want a custom sort order for the alignment of the superheroes. Since your customer is from Switzerland, all neutral heroes are to be sorted at the beginning, good and bad afterwards. You could of course index a support attribute alignment_order that contains the numeric representation of the alignment (neutral=1, good=2, bad=3) and sort for that. That would work and be the fastest and most obvious solution.

But it is not flexible. What, if the rules for the sorting change? That would require reindexing your data. Not a problem if you have only a few thousand superheroes in your index. But what if you have billions on financial transactions and reindexing would consume a lot of time and resources? When working with large amount of data, it is always good to know alternatives that work without altering the index. Scripting is such an alternative that works without modifying your data.

Let’s go through it, step by step. That script based sorting returns a numeric value which is indicated in line 6. The other option would be a string. Line 8 indicates that the Painless language is used for scripting. The whole script is located in the next line.

params.mapping[doc['alignment.keyword'].value]

The script takes the alignment from the document, looks it up in the mapping and returns the retrieved value. The mapping is defined in the params section just below the script.

"params":{
"mapping":{
"neutal":1,
"good":2,
"bad":3
}
}

The params section allows to provide additional data that is visible in the context of the script execution. That separates the script from data and reduces its complexity. And it allows to reuse the same script for different use cases by changing just the params. It is also possible to store the script in Elasticsearch and reference it.

Fine tuning

So, that works perfect for the use case. At least, until we have to care about outlying data. It is known, that there are only three different types of alignment. for a superhero or supervillain. But from time to time is not clear how one should be classified. In these cases, “unknown” is used.

But when you execute the query with the scripted sorting again, it will result in an error. Depending on the number of shards, you will receive only an error of an error and some results. The later behavior results in just some failing shards.

What happens? The script tries to look up unknown in the factor map. That key does not exist, so the map will return null and that causes an exception (since a number is expected). We can fix that by adding a default value.

There are two changes. The script was extended by ?: params.default. That funny operator is called Elvis and does just evaluate the expression before. If it is not null, it is returned. Otherwise, the right-hand side is returned. And that expression refers to the new entry in the params. We could also use an inline value, but that way gives more flexibility. That default handling works also if a document does not contain the sorting attribute.

Summary

Scripting is a powerful feature that should be considered to solve your use cases. It allows to shift logic inside Elasticsearch and allows to add custom logic to your queries. That also can save you from reindexing your data or expensive pre- or post processing.

Resources

About Author

Wolfgang

Senior software engineer that spent the last then years working on search technology and software development. From full text search to ontologies, from information extraction to machine learning. Always interested in discovering new topics.

Connect

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.