Both LIKE and regular expression operators can cause performance problems as these kinds of queries may not be able use indexes so it means a full table scan every time. This is possible for small fields on a limited number of records but it is unacceptable if the table is huge and the text field is large. Furthermore, it is hard to match approaching words which is something often desired when looking for wide text fields.

Of course, it is possible to install more dictionaries if needed. Let’s see how it works. When the text is cast as text search vecrtors (tsvector type), it is stripped from stop words and turned into searchable lexemes:

Searches in tsvectors can be made using the @@ operator and a ts_query type. If the above text is searched for the words enhanced colors, even though it does not contain the exact words, it still matches the criteria:

This works because the query planner sees the query and the index both use the same immutable function with the same arguments and it brings back full text search queries in the realm of the milliseconds. But it has a major flaw: if a new column has to be used in the query, all SQL queries have to be rewritten to use the new index. This is acceptable for simple applications with limited ways of searching content but it can be a real problem with more complex search systems where content must be weighted by example. In this case, a column can be added to the table containing the pre-calculated tsvectors so queries will always be column @@ to_tsquery($*). This requires a trigger to catch write operations on the table to calculate the tsvector column every time. The good side of this method is to have the to_tsvector function call only in one place. In Postgres 9.4, it is possible to use the functions tsvector_update_trigger and tsvector_update_trigger_column as triggers (see the documentation).

text search vectors and Pomm model manager

Adding an extra column containing the vectors is not a big deal since it is easy not to hydrate entities with it:

The plainto_tsquery function does the simple job of stemming the input and adding a & between the remaining words. The function to_tsquery can be used to issue more complex queries. (see the documentation about it).

Of course, it is possible to tune the search, to weight some of the columns or modify results ranking to cross full text search with other criteria. There will be another article to explain how to index LIKE statements using the pg_trgm extension and make reactive autocomplete input fields.

Copyright Grégoire HUBERT 2014 - 2015. All contents on this site are under the CC by-sa licence.CSS nicely adapted by Mikael Paris.Free tools for a free network. This site has been made and is powered using open source and free softwares only.