May be you make pred cache parsing words in quantity of min_words_in_file.
In this case no need redesign algoritm.
Simply skip index pred cache words if cache not full and
current indexed words count for file less then min_words_in_file.
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Tuesday, July 05, 2005 9:05 PM
> To: ??????????? ??????
> Cc: Multiple recipients of list
> Subject: Re: new fuction
>
>
> On Tue, Jul 05, 2005 at 12:47:04AM -0700, ??????????? ?????? wrote:
> > I want propose add new parameter in config file:
> > min_words_in_file 1
>
> Swish-e doesn't really know how many words are in a file until after
> they have been indexed. So each document would either need to be
> parsed twice, or indexing redesigned to parse and store words before
> indexing, or to have a way to "un-index" all the words.
>
> There's actually code to do the later -- it's used to reject a
> document based on its title.
>
> Is files size not a good enough indication of "too small"?
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>