Activity

David, sorry I didn't know about your patch and happened to fix this issue as part of LUCENE-4955. Your patch seems to operate very similarly and adds supports for whitespace collapsing, is that correct? Don't hesitate to tell me if you think the current implementation needs improvements.

Adrien Grand
added a comment - 26/Apr/13 23:31 David, sorry I didn't know about your patch and happened to fix this issue as part of LUCENE-4955 . Your patch seems to operate very similarly and adds supports for whitespace collapsing, is that correct? Don't hesitate to tell me if you think the current implementation needs improvements.

Harald Wellmann
added a comment - 09/Jan/13 08:46 As long as this issue is not fixed, please mention the 1024 character truncation in the Javadoc.
The combination of KeywordTokenizer and NGramTokenFilter does not scale well for large inputs, as KeywordTokenizer reads the entire input stream into a character buffer.

Could you add some examples for NO_OPTIMIZE and QUERY_OPTIMIZE? I can't tell from looking at the patch what those are about. Also, note how existing variables use camelCaseLikeThis. It would be good to stick to the same pattern (instead of bufflen, buffpos, etc.), as well as to the existing style (e.g. space between if and open paren, spaces around == and =, etc.)

I'll commit as soon as you make these changes, assuming you can make them. Thank you.

Otis Gospodnetic
added a comment - 14/May/08 07:06 Thanks for the test and for addressing this!
Could you add some examples for NO_OPTIMIZE and QUERY_OPTIMIZE? I can't tell from looking at the patch what those are about. Also, note how existing variables use camelCaseLikeThis. It would be good to stick to the same pattern (instead of bufflen, buffpos, etc.), as well as to the existing style (e.g. space between if and open paren, spaces around == and =, etc.)
I'll commit as soon as you make these changes, assuming you can make them. Thank you.