I'm trying to get swish to spit out all the files in its index. Swish is
outputting the correct number of results, but some are missing and some are
duplicated.
I'm not sure if there's a better way to do this, but I use a query like so:
70) %swish -w not adjfso9939lkj > all
Assuming that 'adjfso9939lkj' is not in the index.
Swish outputs the correct number of results (number of files indexed), but
some are missing, and some are duplicates:
71) %perl -ne '/(\d+\.htm)/ && $x{$1}++ && print "$1\n"' all | wc -l
41
In this case 41 are duplicates.
And looking at the 'all' file I indeed find things like these two results
in a row:
1000 /docs/000159.htm "Peabody Museum of Archaeo...
1000 /docs/000159.htm "Peabody Museum of Archaeo...
78) %swish -V
SWISH-E 1.3 (really 1.3.1.1)
Any ideas on what's happening?
Bill Moseley
mailto:moseley@hank.org