This replaces the basic searching functions in Oddmuse. This means that results will be sorted by relevance, and that Perl regular expressions will no longer work. If you need the old behaviour back, you need to add the old=1 parameter to the URL.

This extension requires the Perl moduleSearch::FreeText which in turn depends on Lingua::Stem. You can get both from CPAN. Since stemming is language-specific, and Oddmuse works for multilingual sites, Oddmuse doesn’t actually utilize the stemming code – but it is required anyway for Search::FreeText to load correctly.

You can just copy the two directories into your cgi-bin directory where the script runs if you cannot get your administrator to install the libraries. See Script::Search and Script::Lingua.

New Search

This new search returns pages sorted by relevance. Perl regular expressions no longer work, of course. Searching for multiple terms searches for any of the terms with multiple matches increasing the relevance. You cannot use and and or for boolean searches.

When using the prefix tag:, the term or phrase will be searched in the separate tag index. You can also negate this particular form (and only this form!) by using the prefix -tag:. This works even for things like Journal Pages. Here’s an example that produces a journal excluding all date pages tagged “RPG”:

<journal search -tag:RPG>

Use double quotes to make search terms and tags mandatory. Example: “foo” “bar” searches for pages containing the foo and bar word. Without the quotes you would get all pages containing foo or bar or both, sorted by relevance. Similarly, to get pages containing both tag foo and tag bar, and not just at least one of them, search for tag:“foo” tag:“bar”.

Unfortunately, there is a considerable drawback: This only works when the index file has been rebuilt. Until then, negative searches will have no effect.

Disk Usage

This module has been tested on the Emacs Wiki in December 2004. At the time the Emacs Wiki had 2733 pages. Building the index on a 3GHz processor with 1G RAM took 44 seconds. Using traditional search (wiki?search=foo) took 6 seconds. Using the new indexed search (wiki?action=search;term=foo) took less than 1 second. The page directory space used: 34M. The word database size: 7.6M.

Problems

It doesn’t work for languages that don’t use whitespace between words, eg. Chinese or Japanese. Splitting Chinese text into words is an ongoing research problem. There is currently no solution for this.

Incremental updates are not possible. We cannot update the main index when saving a page instead of rebuilding the index from scratch. I suggest you use a cron-job somewhere to rebuild the index from time to time. Currently a second set up indexes is rebuilt containing all the pages with recent changes every time you edit a page.

Stemming is essentially unusable on a multilingual wiki. That’s why the code doesn’t do any stemming at all. In addition to that, the cloud action would only list stemmed tags if stemming were enabled.