Ben Caldwell wrote:
> This may be related to the thread "HTTP spidering - zero results" that
> bounced around on the list in June, but wasn't sure if a resolution was
> ever reached.
>
> When attempting to index via HTTP, I seem to only be getting as many unique
> words as there are files that I attempt to index. Have pasted a sample of
> the results I'm getting below. In this case, I'm only trying to index the
> first page of the site, but if I set the MaxDepth variable higher than 1, I
> only end up with as many unique words as swishspider attempts to index.
I just tried that and everything worked find (I created an index with
227 words).
One interesting thing to try would be to use another program to get the
source (lynx -source is an easy way) and change only the parts of the
config file that control the HTTP code and see what happens when you
index that as a file. If the results are identical, then you have some
misconfiguration in the indexing engine and not the retrieval engine.
moo
------------------------------------------------------------
Ron Samuel Klatchko - Senior Software Jester
Brightmail Inc - rsk@brightmail.com