Meta

Can that title have the word ‘mongo’ in it any more times? Well, fear not, I’m about to use it even more…

So, I had to fool around a bit to get the so-called “Mongosphinx” gem working with my app architecture. Thought it might be helpful to others to demonstrate how I did it. I’ll boil it down to a generic sort of implementation. Hit the jump to see the whole bloody mess…

defself.search(query)# method returns a sphinx resultset object with its own each() method# iterate over that and pull each element out as an old-fashioned array of Documents
by_fulltext_index(query).each{|p|p}end

If you’ve been using MongoMapper, most of the key stuff should be obvious already. The ‘sphinx_tags’ is a special trick I cooked up to make this work well with my acts_as_mongo_taggable plugin. Basically, whenever a Document is saved/updated, I jam a single string into the sphinx_tags field in Mongo. This lets sphinx index those tags easily.

The search just takes advantage of Mongosphinx’s normal by_fulltext_index method.

The reindex thing, while probably a hacky solution, works well enough for now that I don’t have a need to do anything fancier. This lets us have the app be reasonably quick to include newly-created documents in the index to be available for search. And I take advantage of GridFS (built into Mongo) to store my last_run value so I don’t even need a cron job for this. If my app starts getting significant document-creation traffic, I might want to do something more sophisticated like delta indexing or whatnot, but for now this is fine for me.

lib/tasks/sphinx.rake

Okay, moving on to the aforementioned rake tasks… Here are the contents of my lib/tasks/sphinx.rake file:

namespace :sphinxdo
desc "generate xml that is sphinx-friendly"
task :genxml=>:environmentdo# this will just puts() to stdout; useful for debugging
Document.xml_for_sphinx_pipeend

# a fail-fast, hopefully helpful version of systemdefsystem!(cmd)unlesssystem(cmd)raise<<-SYSTEM_CALL_FAILED
The following command failed:#{cmd}
SYSTEM_CALL_FAILEDendend

So this should be pretty self-explanatory, especially if you’ve already used acts_as_sphinx with a standard ActiveRecord-backed app. After you’ve gotten everything (sphinx and the mongosphinx gem, specifically) installed, you should be able to use these rake tasks to start and stop the searchd daemon, as well as run the indexer.

(update 1/22/10: Note that I’m suggesting you get dacort’s fork of mongosphinx. He’s done a nice job of adding excerpting, pagination, and better compatibility with the latest mongomapper. He also pulled in my fix that makes it play nice with ruby 1.9.)

One nicety here is the sphinx:genxml task. Running this is helpful when you’re trying to get everything setup, to prove that you’ve done things right. It should output a big XML file of all the documents it would index. If it doesn’t, or you get something weird, then ur doin’ it wrong.

config/sphinx.conf

Finally, to help you get up and running, here’s my config/sphinx.conf file. Pretty standard:

Again, all of this may fall down spectacularly once you get up to some serious data being pushed from the app into the indexer. At that point, do something else, something more awesome. But consider this a basic start on jamming data from your mongo collection(s) right up sphinx’s pipe.

Just a guess (I’ve moved on to Sunspot/Solr for use with Mongo now, instead of Sphinx), but I’m betting this is due to you using a newer version of MongoMapper. In the more recent versions, Nunemaker has switched MM’s treatment of mongo object IDs from being a straight string to being an actual ObjectID object that can be coerced to a string. It sounds like runner.rb is getting called at some point and told to run match() on the id, assuming that it’s a String (which has a match method), but it’s an ObjectID (which does not).

Again, just a guess based on the error you are seeing. Likely you’ll have to hack around in MongoSphinx and find wherever it’s trying to use that object id and add .to_s

I solved this issue in case anyone else has the same problem, although I’m still not entirely sure of the cause. I changed my genxml task to actually write an tmp/accounts.xml libxml2 file, and then in my sphinx config I changed my xmlpipe_command to = cat /tmp/accounts.xml. So the problem appears to have been to do with extra characters someone getting added to the puts output from xml_for_sphinx_pipe.

I also had a separate problem starting sphinx. It kept telling me that port 9132 was in use, which it wasn’t. I removed port 9132 from the sphinx.conf file, and now it starts fine, but still on port 9132.

Fine blog. I got a lot of effective information. I’ve been following this technology for awhile. It’s intriguing how it keeps varying, yet some of the core factors stay the same. Have you seen much change since Google made their most recent acquisition in the field?