Add blog search to Ghost with PHP

By Michael ArgentiniManaging Partner, Technology and Design

We recently replaced our blog publishing technology with the new Ghost blog platform. They've recently acquired Roon, so it's going to be very interesting in the months ahead. But one of the drawbacks of using an edge technology is that it's usually lacking a feature you need. In our case, it was lacking a few, but the biggest was search.

As we searched around to see if anyone had addressed this, we discovered one tool on Github (also mentioned on the Ghost forums) named GhostHunter. The author, Jamal Neufeld, took a creative approach to solving the search gap by using Javascript to hit the Ghost RSS feeds. Very clever. But as we implemented it on the development server, we saw the limitations of this approach.

RSS feeds in Ghost are finite; you have to modify Ghost to show more entries than the default set.

The user experience was poor; the search had to be modified to maintain state across pages, and browser history was mostly unusable.

Styling the results was a bit of a chore.

The results were not based on, or prioritized by, relevance. It was a simple text match (keyword) search.

For a super simple implementation (and for non-developers) it may be a good choice. But it wasn't for us. So for our purposes, we decided to write a search.

Our primary site is built with PHP. Ghost is running on its own subdomain, and uses a SQLite database. Two sites on the same server. It turns out that the version of SQLite Ghost uses also has the FTS4 extension compiled into the binary.

Score.

So the answer was clear: build a search page in PHP which taps into the SQLite database and uses FTS4.

We use a Windows Server with IIS 7, but the process should be similar in a LAMP configuration.

Step 1: Configure PHP

I had to modify the php.ini file to make sure that the SQLite library was installed. To do this, I had to make sure that the following was uncommented.

[PHP_SQLITE3]
extension=php_sqlite3.dll

Then I had to make sure that the binary existed in the PHP root path, under the "ext" folder. We run PHP 5.5, so the path to our binary is:

[PHP root]/v5.5/ext/php_sqlite3.dll

When I echo phpinfo(); the sqlite3 library version we're using is 3.8.3.1. FTS4 has been available since version 3.5.0, so I was good to go.

Step 2: Make Sure the Site Can Access the Database File

Your PHP code will have to be able to access the database file. We run our site and blog in distinct directories under a root, like this:

/web root/www/
/web root/blog/

In PHP, $_SERVER['DOCUMENT_ROOT'] returns the local drive path to the site root. So by performing a simple string replace I was able to get the local drive root of the blog like this:

str_replace("www", "blog", $_SERVER['DOCUMENT_ROOT'])

So as long as our site is running under a user context that has access to the blog folder, we can get the path to the database file itself like this:

Step 3: Building the Search Assets

SQLite with FTS4 requires a virtual table for searching. And with a CMS, that means rebulding it whenever there's a change to the post data. When you create a new post, the search should know this and update its search data. So the first step is to create a strategy to handle this process.

My strategy was to add a new table to the Ghost database to store a single "last update date" value. This table is created if it doesn't exist. But essentially this is checked on each page request. If the last update date is older than the last update date of any post, we know that we need to update the virtual search table. No scheduled task is needed to perform this check, and it only executes when it is necessary. I like simple. This approach should work through version updates to Ghost, provided the key columns and tables don't change significantly in structure.

I began by getting a search query from the URL, sanitizing it, and creating a query string that can be used for FTS4 search.

The code below looks for a URL parameter "s" and processes it for full-text search. The search query is converted into a string array, special characters trigger wrapping in quotation marks, and then the array is imploded and assigned to a SQL query clause.

With that completed, I needed to query the last update date to see if I need to rebuild the search data (the virtual table). I also check for the existence of the search data table and create it if necessary. When rebuilding the search data, it's done with a simple INSERT statement. Cake.

Step 4: Perform the Search

At this stage, I need to perform the actual search. Two things remain:

I need to create a PHP function that will determine the relevance based on a score (a weight), and tie this into SQLite. This will give me a score column in the results, so I can order by relevance.

I need to perform the actual search query and output the results.

There are all kinds of relevance algorithms out there (like the Okapi BM25 ranking algorithm) that you can build into a custom SQLite binary (egads!), and to write one from scratch is a full-time job (just ask the people at Google). I was mostly interested in a frequency weighting, so I found this Github Gist that takes the count of the various keywords and phrases into account when it ranks the results. It's fast, and makes the search results really helpful, as opposed to a simple keyword match and ordering based on date or something else irrelevant. I'm sure we'll look into more advanced relevance algorithms over time, but this is a great start.

Below is the PHP function to place in the page, which I will reference later when I assign it to the SQLite instance.

The actual search is really simple. I first added the ranking function to SQLite, then I ran a query and looped through the results. I constrain the search results to 99 to keep things manageable, since our search does not use pagination.

Wrapping it Up

The last remaining piece was to create a search box and button in the Ghost theme. It's wrapped in a form tag that performs a GET to the search page I created. And likewise, I copied the search form and placed it on the search page itself as well. Here's an example.