@Thomas: I've used Sphinx, Solr and ElasticSearch in production and my favorite is ElasticSearch for a few reasons:

Sphinx and Solr both index documents periodically (with a large delay or manual re-index by default) so it's more difficult to index documents near real time. ElasticSearch has a default delay of one second.

ElasticSearch has built-in cluster support for High Availability solutions. This is possible in Sphinx/Solr as well but again more difficult to set up.

ElasticSearch and Solr both have built in "More Like This" queries, which is the main reason I had to stop using Sphinx. Sphinx doesn't have a built in solution for providing "similar" documents.

ElasticSearch is the easiest to set up and get running for development AND production. Sphinx comes close because you only have to edit one configuration file. Solr is the hardest because you need to optimize it for production (and not use the built in Solr package provided with Sunspot, for example). There are lots of configuration files with Solr and usually you would want to use Tomcat to serve the search engine in production, which requires a lot of configuration by itself. It's also extremely simple to set up multiple indexes (or version your indexes) in ElasticSearch.

The ElasticSearch API is JSON-based, so you can integrate the search engine easily with any application. You don't NEED a wrapper library to get up and running fast.

After working with different search engines for a while now, most of them require lots of time tweaking and configuring to fit your needs. The biggest advantage for ElasticSearch is the built-in functionality that usually requires lots of configuration. Less configuring means fewer opportunities to break and more time to spend concentrating on more important things like building your website! Also check out ElasticSearch's percolate queries...another cool feature that you may find useful.

Thanks for summarizing for us @Daniel, I was going to ask the same question. I've been struggling with which engine to use for a production app. I watched this screencast with a bit of an attitude - oh crap another search engine. But after reading your reasoning I think I might give ES a go now, rather than the others. You've managed to turn me around.

No problem @Dom. Ryan gives a nice overview in the screencast but there are some awesome features that aren't covered like date histogram facets and percolate queries that are worth looking into. The date histogram facet can group a field's total by month, week, day etc. For example, if you have a website with items and they belong to users, you can group the user's items by month with the date histogram facet. And, from ES's website, percolate queries...

Think of it as the reverse operation of indexing and then searching. Instead of sending docs, indexing them, and then running queries. One sends queries, registers them, and then sends docs and finds out which queries match that doc.

As an example, a user can register an interest (a query) on all tweets that contain the word “elasticsearch”. For every tweet, one can percolate the tweet against all registered user queries, and find out which ones matched."

Also, the documentation for the tire gem is somewhat lacking/confusing in my opinion so I had to do some extra research to find out how to add date boosting for queries and use different stemmers like KStem (KStem is less agressive than snowball and the other stemmers if you need stemming). It's really easy to customize your index settings to optimize for faster queries or faster indexing, setting up custom analyzers and changing your index schema.

"Sphinx and Solr both index documents periodically (with a large delay or manual re-index by default) so it's more difficult to index documents near real time. ElasticSearch has a default delay of one second."

Sphinx has a feature called "delta indexing" which provides real-time updates to the index. So Sphinx doesn't have to rely on periodic updates.

In sphinx, delta indexing acts like a secondary index where you can index a smaller number of documents (such as new documents added today) and your site will search the main + delta index. You need to merge the delta index with the main index frequently (once a day or once a week at least) by using a cron job or other periodically running task. Also, the delta index doesn't imply real-time indexing either, you still have to periodically update the delta index as well. It's not real time. Last time I used sphinx about a year ago, they were experimenting with an update API where you can just update the document in the index, which would be real time.

I have a question about tire/elasticsearch working with ActiveRecord. Your episode talked about using a filter to put constraints on the records. I was wondering if it was possible to combine tire/elasticsearch with scopes. E.g. in the show notes, instead of using a filter to filter out articles not yet published, would it be possible to use an activerecord scope on Article

I'm new to rails so sorry if this is a dumb question, but working on an application similar to this except that every article will have a text document attached to it (pdf, doc, etc.) does a search enginer like ElasticSearch search the actual text of the files? if not, does anyone know of a tool that would help me?

I've been struggling for a few days, first installing ES as a service (the 0.19.6.deb wouldn't work on both LMDE and Ubuntu server 12.04), and then displaying results with pagination (I found the ES website quite outdated).

I followed your instructions and after i started 'rake db:setup ' MY WHOLE WEBSITE BROKES APART!!! WHAT THE HELL HAPPENEND??? I CANT EVEN SIGN IN OR UP TO ME WEBSITE AND I GET "TEMPLATE IS MISSING" AND WHEN I LOOK IN PGADMIN ALL COLUMNS ARE GONE. YOU HAVE TO HELP ME OR I WILL GO INSANE I SWEAR

I am going to respond to your comment assuming that you are a true beginner. If I am saying something you already know or find the tone of my response offensive then I am sorry, I am only trying to help.

Before I continue to answer your comment I ask you to please not type messages in all-capitals, it appears as if you are shouting and comes across quite rude. Also saying things like "you must help me or I will go insane" doesn't inspire others to help you out, its more likely that you'll get ignored. No matter how desperate you are, if you want someone's help the best thing to do is to be polite and clearly explain your problem and adding as much relevant information as possible. You will find that this applies to communications on most tech resources (mailing lists, forums, etc.)

Now on to your problems:

I'm afraid you made a beginner's mistake, running rake db:setup will setup a fresh database, so that's what happened (that's a standard rake task in Rails). If you ran that command on your production environment the only thing you can do is restore your latest database backup. If you don't have a backup there's really nothing that can be done.

As for the template missing error, its hard to say without knowing your codebase. If you've been editing code on your production system (which you really shouldn't be doing, development should be done on in its own place, for example on your own machine) you will have to debug the error and fix it.

I've been using ElasticSearch it's going fine, but I'm wondering if it's possible make 'operations' I mean for example how can I get the average or plus two values etc. ?? I'm not sure if can I do this using Tire??

This Railcast really should include a warning **NOT ** to use rake db:setup if you already have a database full of rows that you don't want to lose! It's obvious for experienced devs but I can see how a beginner might misinterpret the instructions and cause himself big headaches.

The Tire gem has been retired in favor of the official elasticsearch-rails gem. Is there any chance we can get an updated cast with that? The new gem has decent documentation, but very few examples of real-world usage.