Building a Search Web App with Dancer and Sphinx

In this article, we’ll develop a basic search application using Dancer andÂ Sphinx. Sphinx is an open source search engine that’s fairly easy to use, but powerful enough to be deployed in high-traffic sites, such as Craigslist and Dailymotion.

In keeping with this year’s Dancer Advent Calendar trend, the example app will be built on Dancer 2, but it should work just as well with Dancer 1.

Alright, let’s get to work.

The Data

Our web application will be used to search through documents stored in a MySQL database. We’ll use a simple table with the following structure:

Each document has an unique ID, a title, and contents, stored as both plain text and as HTML. We need the two formats for different purposes — HTML will be used to display the document in the browser, while plain text will be fed to the indexing mechanism of the search engine (because we do not want to index the HTML tags, obviously).

We can populate the database with any kind of document data — for my test version, I used a simple script to fill the database with POD documentation extracted from Dancer distribution. The script is included at the end of this article, in case you’d like to use it yourself.

When Sphinx is installed, it needs to be configured before we can play with it. Its main configuration file is usually located atÂ /etc/sphinx/sphinx.conf. For our purposes, a very basic setup will do — we’ll put the following in theÂ sphinx.conf file:

This defines oneÂ source, which is what Sphinx uses to gather data, and oneÂ index, which will be created by processing the collected data and will then be queried when we perform the searches. In our case, the source is the documents database that we just created. Thesql_query directive defines theÂ SELECT query that Sphinx will use to pull the data, and it includes all the fields from theÂ documents table, exceptÂ contents_html — like we said, HTML is not supposed to be indexed.

That’s all that we need to start using Sphinx. After we make sure theÂ searchd daemon is running, we can proceed with indexing the data. We callÂ indexer with the name of the index:

$ indexer test

It should spit out some information about the indexing operation, and when it’s done, we can do our first search:

It’s the documentation forÂ Dancer::Plugin, and it makes total sense that this is the first result for the wordÂ plugin. Sphinx setup is thus ready and we can get to the web application part of our little project.

The Basic Application

We’ll start with a simple web application (let’s call itÂ DancerSearch) that just shows a search form, and then we’ll extend it with more features. It will be using Dancer 2.0, and theDancer::Plugin::Database plugin (we’ll use it to access the documents database). The code below is the initialÂ lib/DancerSearch.pm file:

Last but not least, we need a configuration file to tell our app which layout we want to use, and how to connect to our documents database using the Dancer::Plugin::Database plugin. This goes intoÂ config.yml:

We can now launch the application, and it will greet us with a search form. Which, unsurprisingly, doesn’t work yet. Let’s wire it up to Sphinx.

The Sphinx::Search CPAN Module

There is a CPAN module calledÂ Sphinx::Search that provides a Perl interface to Sphinx, and we’re going to use it in our app. We putÂ use Sphinx::Search inÂ DancerSearch.pm, and add the following piece of code before theÂ get '/' route handler:

This creates a new instance of Sphinx::Search (which will be used to talk to the Sphinx daemon and do the searches), and sets up a few basic options, such as how many results should be returned and in what order. Now comes the most interesting part — actually performing a search in our application. We insert this chunk of code at the beginning of theget '/' route handler:

Let’s go through what is happening here. First, we check if there was actually a search phrase in the query string (params('query')->{'phrase'}). If there was one, we pass it to the$sph->Query() method, which queries Sphinx and returns the search results (the returned data structure is briefly explained in the description of the Query method in Sphinx::Search documentation).

We then check the number of results ($results->{'total_found'}), and if it’s greater than zero, it means we found something and we need to retrieve the documents data from the database. Sphinx only returns the IDs of the matching documents (as shown earlier in the test search that we did using the command line), so we need to send a query to the database to get the actual data, such as document titles that we want to display in the results (note that we’re using theÂ ORDER BY FIELD construct in theÂ SELECT query to maintain the same order as the list returned by Sphinx).

When we have the documents data ready, we pass it along with other information (such as the total number of results) to be displayed in our index template. But, hold on a second — the template is not yet ready to display the results, it only shows the search form. Let’s fix that now — below the search form, we add the following code:

This displays the phrase that was submitted, the number of hits, and a list of results (or a “no hits” message if there weren’t any).

And you know what? We’re now ready to actually do a search in the browser:

Neat, we have a working search application! We’re just missing one important thing, and that is being able to access a document that was found. The results link to/document/:document_id, but that route isn’t recognized by our app. No worries, we can fix that easily:

This route handler is pretty straightforward, we grab the ID from the URL, use it in aÂ SELECT query to the documents table, and return the HTML contents of the matching document (or a 404 page, if there’s no document with that ID).

Conclusion

What we’ve built is still a very basic application, lacking many features — the most obvious one that’s missing is pagination, and being able to access results further down the list, not just the first ten. However, the code can be easily extended, thanks to the flexibility and ease of use of both Dancer and Sphinx. With a bit of effort, it can be made into an useful search app for a knowledge base site, or a wiki.

I think this application is a good example of how Dancer benefits from being part of the Perl ecosystem, giving web developers the ability to make use of the thousands of modules in CPAN (like we just did with Sphinx::Search). This allows to build working prototypes of web applications and implement complex features in a very short time.

The POD Extraction Script

As promised, this is the script that I used to extract the POD from Dancer distribution and store it in the MySQL database:

You can run it with one argument, which is the location of the directory that will be scanned (recursively) for .pm/.pod files, or with no arguments, in which case the script will work with the current directory.

(Note: The script makes use ofÂ Pod::Simple, which I’m not very familiar with, so it’s possible that I’m doing something stupid with it — if that’s the case, please let me know.)

This entry was posted
on Friday, December 14th, 2012 at 1:37 pm and is filed under Dancer, Perl.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.