lucene-java-user mailing list archives

On Wed, 2005-02-09 at 06:56 -0500, Erik Hatcher wrote:
> The only caveat to your VerbatimAnalyzer is that it will still split
> strings that are over 255 characters. CharTokenizer does that.
> Granted, though, that keyword fields probably don't make much sense to
> be that long.
>
> As mentioned yesterday - I added the LIA KeywordAnalyzer into the
> contrib area of Subversion. I had built one like you had also, but the
> one I contributed reads the entire input stream into a StringBuffer
> ensuring it does not get split like CharTokenizer would.
That's good to know. When indexing web sites I use the URL as the
identifier and hence store it in a keyword field. While not common, it
is possible for URLs to be longer than 255 characters. That could have
led to some very awkward bugs to track down.
I'll probably switch over to your KeywordAnalyzer.
--
Miles Barr <miles@runtime-collective.com>
Runtime Collective Ltd.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org