Full Text Indexing with PostgreSQL

From PostgreSQL wiki

This article discusses full text indexing with the contrib/fulltextindex module only. More advanced indexing is available via the contrib/tsearch module, which is not covered here.

The contrib/fulltextindex module is a quick and easy way of achieving basic text indexing under PostgreSQL. The module is available in 7.1 and below with the ability to index one column per table, and in 7.2 is able to index an arbitrary number of columns per table.

This form of full text indexing works by recording all possible suffixes of the words it indexes. For instance, the word 'sydney' would have strings of Sydney, 'ydney', 'dney', 'ney' and 'ey' stored. This means that no matter what part of the word Sydney your user types, it will be found in the index.

Anyway - that's technical stuff! Just follow the steps below and you don't even need to know how it works!

Contents

Installation

Once you have configured and installed PostgreSQL, you need to change to the contrib/fulltextindex directory in the source tarball. Typing 'gmake all' will compile the module, and 'gmake install' will install it. The installation will setup at least an fti.so shared library and an fti.pl index generation convenience script. I recommend looking at the README.fti file.

To enable full text indexing in one of your databases, issue the following SQL:

3. Create the trigger that will be fired whenever changes are made in the indexed table, to update the full text index table.

eg.

CREATE TRIGGER "articles_fti_trigger" AFTER UPDATE OR INSERT OR DELETE
ON articles FOR EACH ROW
EXECUTE PROCEDURE fti(articles_fti, title, body, keywords);

Where in this case 'articles_fti' is the table where the index for 'articles' is kept, and 'title', 'body' and 'keywords' are the fields in the 'articles' table that we are indexing.

Note: In PostgreSQL 7.1 and below, you can only index ONE column, not multiple columns.

4. Now just insert, update and delete the 'articles' table however you like and the full text index will be kept updated.

Indexing An Existing Table

There are a couple of situations where you might want to index an existing table. You might have an already existing table that you want to index and then use a trigger to keep up-to-date, or instead of using a trigger you can schedule a cron job to re-index the table once a day (or something like that).

Anyway, you use fti.pl.

Let's suppose that the 'articles' table mentioned above already has data in it. Follow these steps to index it.

1. You need to add a table alias to the 'articles_fti' table for every keyword you use.

2. The '~' operator indicates a regular expression. The '^' means "match beginning of word". The tricky use of these operators ensures that PostgreSQL's btree indexes will work.

3. You must lowercase all the keywords, as the full text index will store only lowercase versions.

4. By default, words of two letters or more will be indexed.

Optimising Your Queries

If you have indexed as above, then the query should already be optimal. You can check this, however, by putting and 'EXPLAIN' in front of the SELECT query above.

There are three operations that need to be optimised: JOINing the full text and indexed tables (based on id and oid); deleting from the full text table based on id; and simply finding matching text in the string column.

If you don't like the idea of a trigger that keeps the index up-to-date, you can forgoe it altogether and just manually regenerate the index whenever you like. Of course, if your data never changes - you don't need a trigger at all.