Deploying Ultrasphinx to Production

Recently I rolled out a Rails app that used the Sphinx full-text indexing service in conjunction with the Ruby Ultrasphinx gem. I am very impressed with some aspects of this project, and I wanted to share my experiences for anyone looking for a better search experience with SQL databases.

Why Sphinx? Sphinx is an open-source, and stable full-text indexing service. It also has good support in the Rails landscape. Why full-text indexing? In a nutshell, people can spot a crappy search implementation really quick. Google is at the top of their game because it searches the way people think. Just try implementing the following with just SQL:

Searching across multiple tables with results being in either, but not both

100,000 rank based results in .02 milliseconds

Cached data, with delta scans for minimal performance impact

Yes, you could do all these things – but why? The folks at Sphinx do nothing but this, and have packaged it up for your to use at your whim. There are other niceties that you can include like sorting, pagination, restricting to certain columns, and best of all spell checking via the raspell gem.

To begin, you will need a MySQL, or PostgreSQL backend – something I just happened to luck out on with this particular application. You should install Sphinx and poke around for a few minutes, to understand what Ultrasphinx provides you.

A note for Windows users – add the Sphinx bin/ folder to your path so you can just call its commands a-la Unix style. Additionally, I had issues running my Rails project in a directory containing spaces. YMMV

Ultrasphinx provides a Rails-centric way of using Sphinx. Sphinx provides the search service, and Ultrasphinx builds the configuration file, and manages the Sphinx process via rake tasks. Inside your models that will be Sphinx-ified, you will need to indicate which fields are indexable, and sortable. A useful feature of Sphinx/Ultrasphinx is the ability to create associated SQL to join multiple tables on the full-text search. See http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html for more information.

Once Ultrasphinx is configured, and has created a configuration file, you can start the indexing process, then start your Sphinx service. Notes on doing this in production via Capistrano follows:

This Capistrano deploy.rb fragment has four tasks – start, stop, status, and reindex. The anonymous before and after calls ensure that the service is stopped before re indexing occurs. Note that this is a full reindex, and not a delta scan. My application didn’t have reliable datetime column to determine new entries with, so I opted to do the full index every three hours instead. The database is a small one, with less than 100MB of data, so I can get away with it here.

Additionally, in Cron, you will want to setup a recurring task in your production server environment: