Hi Bill,
This is just a temporal fix to stemming in wildcard search. Let me know
if it works.
Change the following lines in function operate in search.c (line number 1048):
if (applyStemmingRules)
{
/* apply stemming algorithm to the search term */
Stem(word, MAXWORDLEN); /* CAREFUL! word length is assumed */
}
by
if (applyStemmingRules)
{
/* apply stemming algorithm to the search term */
i=strlen(word)-1;
if(i && word[i]=='*') {
word[i]='\0';
} else i=0; /* No star */
Stem(word, MAXWORDLEN); /* CAREFUL! word length is assumed */
if(i) strcat(word,"*"); /* restore the star - Need to check
lentgh of the string? */
}
If you search for "word*", ir trims the '*', then stems "word" and restore the
"*".
So for "running*"
running* --> running --Stem(running)-->run-->run*
for "runs*"
runs* --> runs --Stem(runs)-->run-->run*
and for "runn*"
runn* --> runn --Stem(runn)-->runn-->runn*
This will guarantee good performance.
BTW, wildcards are treated like normal words. The only difference is in
function getfileinfo:
- Search for a normal word (no wildcard) is made using a fast hash approach
- Search for a wildcard word is made using a sequential approach (words are
sorted in the
index file). So, it returns all the data for all the words, without using an
"or" function, getting
all the data at once. For this reason the performace is better.
Another thing, I think the same fix must be applied to soundex...right?
Waiting to hear from you.
cu
Jose