Main menu

A search-based suggester for Elasticsearch with security filters

Both Solr and Elasticsearch include suggester components, which can be used to provide search engine users with suggested completions of queries as they type:

Query autocomplete has become an expected part of the search experience. Its benefits to the user include less typing, speed, spelling correction, and cognitive assistance.

A challenge we have encountered with a few customers is autocomplete for search applications which include user-based access control (i.e. certain documents or classes of document are hidden from certain users or classes of user). In general, it is desirable not to suggest query completions to users which only match documents they do not have access to. For one thing, if the system suggests a query which then returns no results, it confounds the user’s expectation and makes it look like the system is in error. For another, suggestions may “leak” information from the system that the administrators would rather remain hidden (e.g. an intranet user could type “dev” into a search box and get “developer redundancies” as a suggestion.)

Access control logic is often implemented as a Boolean filter query. Although both the Solr and Elasticsearch suggesters have simple “context” filtering, they do not allow arbitrary Boolean filters. This is because the suggesters are not implemented as search components, for reasons of performance.

To be useful, suggesters must be fast, they must provide suggestions which make intuitive sense to the user and which, if followed, lead to search results, and they must be reasonably comprehensive (they should take account of all the content which the user potentially has access to.) For these reasons, it is impractical in most cases to obtain suggestions directly from the main index using a search-based method.

However, an alternative is to create an auxiliary index consisting of suggestion phrases, and retrieve suggestions using normal queries. The source of the suggestion index can be anything you like: hand-curated suggestions and logged user queries are two possibilities.

To demonstrate this I have written a small proof-of-concept system for a search-based suggester where the suggestions are generated directly from the main documents. Since any access control metadata is also available from the documents, we can use it to exclude suggestions based on the current user. A document in the suggester index looks something like this:

In this case, the phrase “secret report” has been extracted from one or more documents which are visible to the group “directors” (excluding Bob and Lauren) and one or more documents visible to groups “financial” and “IT” (excluding Max.) Thus, “secret report” can be suggested only to those people who have access to the source documents (if filtering is included in the suggestion query).

The proof of concept uses Elasticsearch, and includes Python code to create the main and the suggestion indexes, and a script to demonstrate filtered suggesting. The repository is here.

If you would like Flax to help build suggesters for your search application, do get in touch!

Unbelievable! A good search can EASILY do the suggester be fault-tolerant, search-based, fast and filtered using a tools that was designed for this! We have done that for 10 years now on data sets of 100 millions of data records.

I’m not sure of your point? It’s true that commercial search engines provide suggesters like this, we were simply showing how you can do the same thing with freely available open source software. Elasticsearch also scales to very large collections as you describe.

Apache Lucene, Apache Solr, Apache Kafka, Apache Hadoop and their respective logos are trademarks of the
Apache Software Foundation. Elasticsearch is a trademark of Elasticsearch BV,
registered in the U.S. and in other countries.