I recently created Tera-WURFL Explorer to allow people to browse through the WURFL, search for devices and upload images to the WURFL images collection. I originally used MySQL’s FULLTEXT index to let people search for devices, but quickly realized that it did not suit my needs. The main problem was that it does not index words smaller than what is specified in my.cnf (ft_min_word_len), and if you want to change it, you need to change it server-wide. This was not a good option for a large virtual host setup since it would affect all the FULLTEXT indices on the server; also, if you do change it, you need to reindex every FULLTEXT column in every database to prevent data corruption.

I did some research on search engines and eventually settled on Sphinx – mainly because it has a cool name, but also because there are some big-name success stories from companies like Craigslist who switched to it and never looked back.

Here’s how I installed it on Ubuntu 9.10:

First, you need to install the dependencies and download sphinx, then extract the archive and make it:

# Start the Sphinx service
service searchd start
# Stop Sphinx
service searchd stop
# Check if Sphinx is running
service searchd status
# Reindex every Sphinx index (works while started or stopped)
service searchd reindex

Now we’ll add sphinx to the startup and use the config option to setup sphinx to run as the sphinx user:

update-rc.d searchd defaults
service searchd config

Note: on RedHat-like systems you can use “chkconfig –add searchd”

Lastly, you need to configure sphinx. I would copy the default config file and edit that one:

cp /usr/local/sphinx/sphinx.conf.dist /usr/local/sphinx/sphinx.conf

You can follow along with the comments in the file, or jump on the documentation site and figure out what all the settings do.

Now everything is setup and should work properly!
If you followed my directions and put the tarball in /tmp, the sphinx PHP and Python APIs and some examples are in /tmp/sphinx-0.9.9/api/. You should put a copy of the PHP or Python API somewhere else on the system so you can use it from your applications.

Something else is already listening on port 9312, that’s why you got “FATAL: listen() failed: Address already in use”. Use “netstat -apn | grep 9312″ to figure out what it is, then “kill PID” (replace PID with the PID from the netstat output) to kill that conflicting process.