During a recent Rails project at IBM we had to deal with a large table consisting of customers. The table is made up of legacy enterprise data, and contains close to a million records.

Among many other fields, the table Customers includes a column name defined as VARCHAR. name is used to store company names.

One of the requirements for the project was to implement a Google Suggest-like feature. That is, when the user enters a few characters for the customer’s name, they should be prompted, via AJAX, with a list of possible suggestions containing that string (or at least beginning with that string.)

The naive approach would be to use a simple LIKE predicate, to generate queries such as:

SELECTnameFROMCustomersWHEREnameLIKE'%micro%'ORDERBYname

One of the main problems with this approach is performance. Using LIKE this way doesn’t allow us to take advantage of indexes that are defined on the column name.

Unsurprisingly, on a modest server, such a query takes about 25 seconds. A UI that isn’t able to respond for 25 seconds will feel like an eternity for the end user, of course, so a simple query like this obviously cannot be seen as a viable solution.

Enter DB2 Text Search. You may be familiar with other full text search engines such as Ferret, Lucene, or Sphinx. DB2 V9.7.4 ships with an extremely powerful full text search engine that has plenty of useful features, including excellent integration with native XML columns.

Using the free edition of DB2 (DB2 Express-C) which includes Text Search, we were able to implement the autocomplete functionality we were after in a heartbeat, thanks to a query like the following:

SELECTnameFROMCustomersWHERECONTAINS(name,'micro')=1ORDERBYname

This query was executed in mere fractions of a second for most searches, and behaved exactly as needed. In fact, not only did we match strings that began with the searched token, but also ones that contained it elsewhere (e.g., micro would match both Microsoft and Sun Microsystems.) The results where ordered alphabetically, but could have easily been ordered by relevance via the SCORE function, also available via DB2 Text Search.

As usual, it’s a matter of using the right tool for the right job, and DB2 Text Search was created exactly for these kind of scenarios.

Let’s briefly look at how you can also go about setting up and playing with it on Linux.

Installing DB2 Text Search

To start with, you’ll need to have a 64bit Linux distro, and then follow these simple steps.

Download DB2 Express-C. Make sure you grab the db2exc_974_LNX_x86_64.tar.gz file and not the Light edition.

Next, install DB2 by following these steps (the Installing DB2 section in particular.) Make sure you select a custom installation, and that you select everything including DB2 Text Search (which is not checked by default).

After you’ve started DB2 with db2start via the instance user (e.g., db2inst1,) launch the text search server by running:

3 Responses to “Speeding up queries by a factor of 100 or more with DB2 Text Search”

Queries using like “name%” actually should use index on name. It’s called prefix search. And normally you would also have a limit clause there, since you really don’t need to retrieve all matches when someone queries for something general like “peter%”.

What you probably wanted to write is a query with “%name%”. It has the same semantics as CONTAINS.

Leave a Reply

I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.