Robert Muir
added a comment - 09/Jul/10 15:34 Thank you very much for contributing this, its true there is no factory for this feature.
I updated your code with a few tweaks:
allow null dictionary. This allows the use of just the hyphenation grammar ( LUCENE-1287 )
allow encoding to be specified (but default to UTF-8). Some of the grammar distributions from offo dont use UTF-8 encoding.
set onlyLongestMatch default to 'false'. this is just to be consistent with the TokenFilter itself, which defaults to false.
added the Apache-licensed danish grammar to test-files, along with a small dictionary and some test cases.
if no one objects, i'll commit in a bit.