Getting Fancy with ElasticSearch

When an app requires full-text search developers usually have two major contenders to choose from: Solr and ElasticSearch. Each addresses different use cases, but generally, ElasticSearch performs noticeably better when an app expects frequent reindexing, as is often the case. Gems like Tire make setting up ElasticSearch a breeze, but setting up more advanced indexes and interfacing with ActiveRecord can sometimes be a pain. Read on to see how to make your life easier with ElasticSearch and Tire.

Say an app needs an “omnibox” – a single search input that searches over multiple fields (for example, a user’s name, email address, and/or company). An initial attempt at setting this up in ElasticSearch with Tire would look like this:

After which we could search for users like User.search('Highgroove', load: true) and get the expected response.

But what if we want to allow partial-string searches? This requires some custom analyzers, in this case n-grams over the strings, which match substrings between the given lengths:

This works, but we can do much better than the mess of hashes above. Personally, I prefer to wrap this setup in a YAML file and parse it separately in an initializer:

We’re almost done now; unfortunately, though, adding custom analyzers interferes with ElasticSearch’s ability to search over all indexes in a #search call. Instead, searches have to take the form `User.search(“name:#{query} OR email:#{query} OR company:#{query}”). We also have to tokenize queries to account for whitespace. When all is said and done, a finished full-text search might look like this:

and we finally have our omni-search by calling this method like User.fulltext_search('groove').

Some final tips and tricks that make life with ElasticSearch that much nicer:

When setting up ElasticSearch on a development machine, it’s easy to mess up the index (for example, trying to run tests that involve ElasticSearch and add non-existent data to the index). Getting rid of this locally is as easy as sending a DELETE command to the ElasticSearch server, which usually looks like curl -XDELETE 'http://localhost:9200/users/', followed by a rake db:setup to re-seed the database and re-index (or User.index.import in Rails console just to re-index).

n-grams can waste memory if you’re not careful; the min_gram and max_gram analyzer settings should be enough to narrow searches down to one record, and no more (a max_gram of 15 over a name is probably wasteful, since very few names share a substring that long).