[dokuwiki] Latest indexer changes

From: "TNHarris" <telliamed@xxxxxxxxxxx>

To: dokuwiki@xxxxxxxxxxxxx

Date: Sat, 18 Nov 2006 14:40:52 -0500

Some comments on the last indexer update.
Indexing was improved by dumping the piecemeal reading of i*.idx and
just loading the file all at once. This made a huge difference. Times
dropped as much as 60% what it was before.
I also got a better sense of how the functions perform. A lot of the
noise in my first tests was caused by having to hit the disk so often.
Without that, I can clearly see that idx_getPageWords doesn't scale that
well. It's not very severe, but that's where any improvements should be
focused.
Searching is also improved a bit, but not as much as I expected.
Statistically insignificant, really. I tried Chris's suggestion to group
same-sized words when reading the index. I didn't see as much of an
affect from that. If there is an improvement, it may be offset by the
extra work that has to be done. The simpler algorithm is probably helped
by the disk cache, so there's not as much penalty for re-opening files.
But I don't entirely trust my home machine to give reliable results for
this. So there are two alternative functions for retrieving search words
from the index. You can choose one or the other from the configuration
manager ($conf['test_indexer']) Set to 0 for classic searches, 1 to try
the new sorted search. This is a bitfield in case I encounter something
else to test.
-- tom
telliamed@xxxxxxxxxxx
--
http://www.fastmail.fm - A fast, anti-spam email service.
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist