Details

Description

after upgrading to lucene 3.1+, I see this in my log:

java.lang.AssertionError: TokenStream implementation classes or at least their incrementToken() implementation must be final
at org.apache.lucene.analysis.TokenStream.assertFinal(TokenStream.java:117)
at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:92)

Turns out I derived TokenStream and my class was not declared final.

This silently breaks backward compatibility via reflection, scary...

I think doing this sort of check is fine, but throwing an java.lang.AssertionError in this case is too stringent.

This is a style check against lucene clients, a error log would be fine, but throwing an Error is too much.

This is not true, there is nothing silent about it. It was listed in the backwards compatibility breaks section of 3.1:

Analyzer and TokenStream base classes now have an assertion in their ctor,
that check subclasses to be final or at least have final implementations
of incrementToken(), tokenStream(), and reusableTokenStream().

Robert Muir
added a comment - 07/Sep/11 20:23
This silently breaks backward compatibility via reflection, scary...
This is not true, there is nothing silent about it. It was listed in the backwards compatibility breaks section of 3.1:
Analyzer and TokenStream base classes now have an assertion in their ctor,
that check subclasses to be final or at least have final implementations
of incrementToken(), tokenStream(), and reusableTokenStream().

The problem behind the checks is that they are done by the class TokenStream, so even if you disable assertions for your own class this will still fail as soon as you enable assertions for the lucene package.

If you want to enable assertions for Lucene but disable assertions in your own code, the ctor should check the actual assertion status of the subclass using Class.desiredAssertionStatus() and not throw AssertionFailedError, as this would affect a class for which assertions are not enabled. Patch is easy.

The same applies for Analyzer.

In another issue (sorry, I don't find it, the JIRA search is dumb - oh f*ck it's Lucene! - maybe it was on mailing list), Shai Erea suggested to do the assertion check only for org.apache.lucene and org.apache.solr packages. There is already a patch somewhere - if I could find the issue.

Uwe Schindler
added a comment - 07/Sep/11 21:49 The problem behind the checks is that they are done by the class TokenStream, so even if you disable assertions for your own class this will still fail as soon as you enable assertions for the lucene package.
If you want to enable assertions for Lucene but disable assertions in your own code, the ctor should check the actual assertion status of the subclass using Class.desiredAssertionStatus() and not throw AssertionFailedError, as this would affect a class for which assertions are not enabled. Patch is easy.
The same applies for Analyzer.
In another issue (sorry, I don't find it, the JIRA search is dumb - oh f*ck it's Lucene! - maybe it was on mailing list), Shai Erea suggested to do the assertion check only for org.apache.lucene and org.apache.solr packages. There is already a patch somewhere - if I could find the issue.

John Wang
added a comment - 07/Sep/11 21:58 Maybe it is just my lack of understanding why the assert is needed. My ignorance tells me it is for code styling. I am sure there is something deeper. Can someone enlighten me?

For analyzers enforcing finalness prevents errors in handling the two different tokenStream() methods (for reuse and not reuse). Also the problems in Lucene 2.9 to make the new TokenStream API backwards compatible (the famous "sophisticated backwards layer") lead to enforcing finalness for this API using the decorator pattern. With this assertion we found lots of bugs even in Lucene code that were caused by overriding methods to change behaviour that should never be done in that way for decorator APIs.

And this is NOT a HARD error. No production system is broken by that, as assertions are generally only enabled for testing. So the assertion failure simply tells your test system that you have to fix your code. In production without assertions, your code will still work. So where is the problem?

If you enable assertions on a production system you will see significant performance problems!!!

Uwe Schindler
added a comment - 07/Sep/11 22:08 Most of this is explained in the related issues to LUCENE-2389 .
For analyzers enforcing finalness prevents errors in handling the two different tokenStream() methods (for reuse and not reuse). Also the problems in Lucene 2.9 to make the new TokenStream API backwards compatible (the famous "sophisticated backwards layer") lead to enforcing finalness for this API using the decorator pattern. With this assertion we found lots of bugs even in Lucene code that were caused by overriding methods to change behaviour that should never be done in that way for decorator APIs.
And this is NOT a HARD error. No production system is broken by that, as assertions are generally only enabled for testing. So the assertion failure simply tells your test system that you have to fix your code. In production without assertions, your code will still work. So where is the problem?
If you enable assertions on a production system you will see significant performance problems!!!

Here a patch that the assertion status of the actual class is used. If you disable assertions for your own code, but leave assertions in Lucene enabled, the failure will not trigger. This is the more correct approach. The reflection check should use the subclass' assertion status to enable/disable the check.

Uwe Schindler
added a comment - 07/Sep/11 22:19 Here a patch that the assertion status of the actual class is used. If you disable assertions for your own code, but leave assertions in Lucene enabled, the failure will not trigger. This is the more correct approach. The reflection check should use the subclass' assertion status to enable/disable the check.