Working with Spring Data Solr repositories

Spring Data is the go-to framework when trying to get access to a database within a Spring application. Next to relational databases it also provides support for a wide variety of noSQL databases, including document-based databases like Apache Solr. In this tutorial I’ll explore the various possibilities of using Spring Data Solr.

Getting started

To create a new Spring boot project with Spring Data Solr, you need to add the spring-boot-starter-data-solr dependency, for example:

If you’re using Spring Initializr, this is equivalent to adding the Solr dependency. Additionally that, you probably also have to configure the spring.data.solr.host repository, for example:

spring.data.solr.host=http://localhost:8983/solr

Writing custom queries

To write custom queries, you first have to make sure that you have your model setup. In my case, I’m going to look for indexed files, containing a file.id identifier and a last_modified and content field. This is based on my earlier tutorials about indexing documents using Spring batch:

Similar to before, we’re now looking for all documents where the content contains the given search term.

Boosting documents

By default, Solr will already boost certain documents. Let’s say we have the following query:

List<MarkdownDocument> findByIdOrContent(String id, String content);

In this case, documents that match both the given id and the content field, will score higher than documents matching either field. This can be easily seen if you add the score field to your model by adding a new field and annotating it with @Score:

@Score
private float score;

If you run your application now, you’ll see that documents matching both id and content will score double the amount of the other documents. The amount of occurences also changes the score.

However, we can also boost certain documents by providing a higher score for certain matches, for example:

In this case, documents that match the given id, will score higher than documents that just match the given content. This allows us to get our results in a different order (default sort order of the results is by score), and thus mark more important results.

Pagination

Working with pagination works the same across all implementations of Spring Data. You simply add a Pageable to your repository method and return a Page<MarkdownDocument> rather than a List<MarkdownDocument>:

This implementation allows you to work with offsets and limits rather than pages and pagesizes.

Highlighting results

If you look at search engines like Google, you’ll notice that they also highlight their results. This is something Solr can do as well, and Spring Data offers you a simple annotation called @Highlight to make it work:

Make sure to also use HighlightPage, otherwise the highlighting data won’t be available (getHighlighted()).

Fuzzy search

Solr also allows you to work with edit distances, so that means that if you search for “goat”, you’ll also get results for “boat” if you enable an edit distance of 1. This type of search operation is also called fuzzy search. To implement this, you need to append the tilde (~) to your search operation, followed by the edit distance. For performance reasons you shouldn’t use an edit distance larger than two.

repository.findByIdOrContent("title", "goat~1");

Working with criteria

When working with repositories, you sometimes want to have more control about the queries you’re about to execute by programmatically defining them. To do this, we can use the criteria API. Before we can start, we’ll have to define a SolrTemplate bean though:

A criteria that checks if the file.id field matches the given search term, and if it does, boost the score by a factor of two.

Another criteria that checks if the content field fuzzy matches the given search term.

After creating those criteria’s, you can either use the Criteria.and() or the Criteria.or() method to join them, and to create a query of them. There are various query implementations such as SimpleQuery, SimpleHighlightQuery, … . Depending on the type of result or page you want to retrieve, you’ll have to pick a different query implementation.

In this case, I’m using the SimpleHighlightQuery and I’m providing the pre- and postfix as seen before by passing a HighlightOptions object. Accessing the highlighted parts can be done by using the page.getHighlights(solrDocument) method.