Details

Description

* If <code>stopWords</code> is an instance of {@link CharArraySet} (trueif
* <code>makeStopSet()</code> was used to construct the set) it will be
* directly used and <code>ignoreCase</code> will be ignored since
* <code>CharArraySet</code> directly controls case sensitivity.

This is really confusing and trappy... we need to change something here.

Robert Muir
added a comment - 09/Feb/12 14:02 Also, for 4.0 i think we should go a step further and remove all this Set<?>/Set<Object> crap/instanceof/copying
instead stopfilter, etc should just take chararrayset, and this is what makestopset should return.
I'll update the patch. for 3.x we can just deprecate the two confusing nuked' ctors from the first patch above...
so we can still make some improvement there.

Uwe Schindler
added a comment - 09/Feb/12 14:06 +1 to remove the Set<?> and hardcode method signatures to CAS.
Changes on CAS should be separate (e.g. make it an interface, so we could have FSTCharArraySet and HashCharArraySet)

I found two traps/bugs and fixed them here as well (these will go in the backport too along with the StopFilter deprecations):

DutchAnalyzer confusingly only used its default 'stem dictionary' (e.g. kind/kinder, fiets) for the no-arg ctor, for other ctors, it would remain empty. This means stemming would be different if you passed an empty stopset.

Standard/ClassicAnalyzer had a ctor that takes File, i think we should deprecate this one, for the one that takes Reader.

Robert Muir
added a comment - 09/Feb/12 16:21 Updated patch for trunk.
I found two traps/bugs and fixed them here as well (these will go in the backport too along with the StopFilter deprecations):
DutchAnalyzer confusingly only used its default 'stem dictionary' (e.g. kind/kinder, fiets) for the no-arg ctor, for other ctors, it would remain empty. This means stemming would be different if you passed an empty stopset.
Standard/ClassicAnalyzer had a ctor that takes File, i think we should deprecate this one, for the one that takes Reader.