remove deprecated custom encoding support in russian/greek analysis

Details

Description

In this case, analyzers have strange encoding support and it has been deprecated in lucene.

For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6, its being represented as Æ

LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
Analyzers. If you need to index text in these encodings, please use Java's
character set conversion facilities (InputStreamReader, etc) during I/O,
so that Lucene can analyze this text as Unicode instead.

I noticed in solr, the factories for these tokenstreams allow these configuration options, which are deprecated in 2.9 to be removed in 3.0

Let me know the policy (how do you deprecate a config option in solr exactly, log a warning, etc?) and I'd be happy to create a patch.

Robert Muir
added a comment - 02/Oct/09 15:29 ok, I guess anyway this isn't an issue.
if 1.5 goes out with 3.1, RussianLowerCaseFilterFactory can be implemented with LowerCaseFilter, but marked deprecated to be removed in 1.6

Shalin Shekhar Mangar
added a comment - 02/Oct/09 06:54 Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?
I guess it is too early to say. But Solr releases do take time so if I had to guess it is likely that 1.5 will go out with Lucene 3.1

Hi, I just removed these deprecations for Lucene 3.0 (which does not affect 1.4)

However, in doing so I noticed that with the custom charset removed, RussianLowerCaseFilter is really exactly the same as LowerCaseFilter.
I've marked this RussianLowerCaseFilter as deprecated to be removed in Lucene 3.1

Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?

Robert Muir
added a comment - 01/Oct/09 20:31 Hi, I just removed these deprecations for Lucene 3.0 (which does not affect 1.4)
However, in doing so I noticed that with the custom charset removed, RussianLowerCaseFilter is really exactly the same as LowerCaseFilter.
I've marked this RussianLowerCaseFilter as deprecated to be removed in Lucene 3.1
Will there be a Solr release based on Lucene 3.0, or will 1.5 be based on 3.1?

Shalin Shekhar Mangar
added a comment - 04/Sep/09 10:02 I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ...
We actually have done this in DataImportHandler in relation to the syntax for evaluators. Logging a warning is the right way to go.

I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... then once the functionality is removed, we can change the factory to fail on init if it sees the option is still set in the schema.xml

Hoss Man
added a comment - 03/Sep/09 22:35 I don't think we've ever really had a situation like this ...logging a warning seems like the right course of action for now ... then once the functionality is removed, we can change the factory to fail on init if it sees the option is still set in the schema.xml