Hi.
I'm currently trying to figure out why the tsearch performance seems to
vary a lot between different queryplans. I have created a sample dataset
that sort of resembles the data I have to work on.
The script that builds the dataset is at:
http://krogh.cc/~jesper/build-test.pl
and http://krogh.cc/~jesper/words.txt is needed for it to run.
Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.
The dataset consists of words randomized, but .. all records contains
"commonterm", around 80% contains commonterm80 and so on..
my $rand = rand();
push @doc,"commonterm" if $commonpos == $j;
push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;
Results are run multiple times after each other so they should be
reproducible:
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on ftstest (cost=0.00..10750.00 rows=40188 width=4) (actual
time=0.102..1792.215 rows=40082 loops=1)
Filter: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 1809.437 ms
(3 rows)
ftstest=# set enable_seqscan=off;
SET
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ftstest (cost=115389.14..125991.96 rows=40188
width=4) (actual time=17.445..197.356 rows=40082 loops=1)
Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
-> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..115379.09
rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 204.201 ms
(5 rows)
Given that the seq-scan have to visit 50K row to create the result and
the bitmap heap scan only have to visit 40K (but search the index) we
would expect the seq-scan to be at most 25% more expensive than the
bitmap-heap scan.. e.g. less than 300ms.
Jesper
--
Jesper