On Mon, 24 Feb 2003, Gentile, Jeff wrote:
> Thanks... I figured out what's going on, I was indexing my script... LOL! as
> I hadn't added the -S prog switch... thinking that I could have one config
> file index both a file system and a prog....
I've wanted for a long time to change -i and IndexDir to accept some fake
URLs as in:
file://somedir/
http://some.site.com/
prog://path/to/some/program
Then they could all be in a single config file.
You can somewhat use DirTree.pl as a replacement for -S fs, so you can
have a single -S prog that reads the file system, runs the spider, and
indexes a database all in one program. (I index one site that's part
static pages which I index with spider.pl and part database which I index
with the MySQL script.)
> Is "-T indexed_words" an undocumented feature? It's great!
No, it's documented on the SWISH-RUN man page, but it only says it exists
and not what all the options are (-T help shows what's available). It's
really there for helping with development and expected as part of the
normal user interface, although I use it to extract out the words from the
index for use in a spell checker.
> However, now my content-lengths are off, I think do to some of the odd characters
> in the tech notes... there isn't some undocumented addition of a "EOF" char string
> type feature, is there?
No there isn't. Might be a good addition.
Perl's "length" counts characters, where swish-e is expecting a byte
count. I have not looked what might happen if the string in Perl contains
multi-byte chars. Perhaps that's what is happening in your case. It's
more common to simply count the wrong number of bytes or use binmode so
lengths are counted incorrectly.
Can you post your -S prog code?
>
> Thanks,
>
> Jeff
>
>
--
Bill Moseley moseley@hank.org