New patch with replacement of deprecated TermAttribute -> CharTermAttribute. It also fixes the reset()/reset(Reader) methods to be conform to all other Tokenizers and the documentations. The current one was resetting multiple times. This has no effect on backwards. Also improve the JFlex classpath detection to work with svn checkouts or future release zips.

Uwe Schindler
added a comment - 08/Apr/10 23:15 New patch with replacement of deprecated TermAttribute -> CharTermAttribute. It also fixes the reset()/reset(Reader) methods to be conform to all other Tokenizers and the documentations. The current one was resetting multiple times. This has no effect on backwards. Also improve the JFlex classpath detection to work with svn checkouts or future release zips.
I will commit this soon when all tests ran.

Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems.

Agreed. Whether its fixed as part of this commit or not, it really deserves its own issue anyway, for changes and tracking. It has nothing to do with this issue other than convenience.

Mark Miller
added a comment - 08/Apr/10 12:02 Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems.
Agreed. Whether its fixed as part of this commit or not, it really deserves its own issue anyway, for changes and tracking. It has nothing to do with this issue other than convenience.

Uwe Schindler
added a comment - 08/Apr/10 12:00 Updated also the error message about missing jflex when calling "ant jflex" to regenerate the lexers. The message now contains instructions for downloading and building JFlex. Also add CHANGES.txt.

Uwe Schindler
added a comment - 08/Apr/10 11:36 Here a new patch, with the zzBuffer reset to default implemented in a separate reset(Reader) method. As yyReset is generated as final, I had to change the name.
Before apply, run:
svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.*
svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.*
I will commit this in a day or two!

Uwe Schindler
added a comment - 08/Apr/10 11:23 You dont need the jflex binaries in general, only if you reconstruct the source files (using "ant jflex"). And its easy to generate, check out and start "mvn install".

BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check it in somewhere? because it hasn't been released last I checked. I'm asking for general knowledge, because I know the scripts are downloading it, or rely on it to exist somewhere.

Shai Erera
added a comment - 08/Apr/10 11:19 I plan to commit this soon!
That's great news !
BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check it in somewhere? because it hasn't been released last I checked. I'm asking for general knowledge, because I know the scripts are downloading it, or rely on it to exist somewhere.
In that case, then yes, let's fix it here.

I plan to commit this soon! So any patch will get outdated, thats why i want to fix this here. And as this patch removes direct access from the Tokenizer to the lexer (as it is only accessible through an interface now), we have to change the jflex file to do it "correctly".

Uwe Schindler
added a comment - 08/Apr/10 11:13 I plan to commit this soon! So any patch will get outdated, thats why i want to fix this here. And as this patch removes direct access from the Tokenizer to the lexer (as it is only accessible through an interface now), we have to change the jflex file to do it "correctly".

Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems.

Shai Erera
added a comment - 08/Apr/10 11:09 Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems.

Steve Rowe
added a comment - 03/Dec/09 21:10 Thanks, Uwe, that makes sense. My bad, I only skimmed the patch, and misunderstood "3.0" in one of the new files to refer to the Lucene version, not the Unicode version.

I'm actually surprised that the DFAs are identical, since I'm almost certain that the set of characters matching [:letter:] changed between Unicode 3.0 and Unicode 4.0 (maybe [:digit:] too). I'll take a look this weekend.

Because of that we have the patch: We now have two flex files, one with %unicode 3.0, which produces the same DFA as the old flex file when processed with Java 1.4 (as it was in Lucene 2.x). This is used for backwards compatibiulity (using the matchVersion parameter of ctor).

For later Lucene versions we will have a new jflex file (currently unicode 4.0) and that produces the same matrix as java 1.5 in jflex 1.4 (at the moment).

By that we simply made the parser regeneration invariant to the developer's JVM. About nothing more is this issue at the moment.

Uwe Schindler
added a comment - 03/Dec/09 17:52 I'm actually surprised that the DFAs are identical, since I'm almost certain that the set of characters matching [:letter:] changed between Unicode 3.0 and Unicode 4.0 (maybe [:digit:] too). I'll take a look this weekend.
Because of that we have the patch: We now have two flex files, one with %unicode 3.0, which produces the same DFA as the old flex file when processed with Java 1.4 (as it was in Lucene 2.x). This is used for backwards compatibiulity (using the matchVersion parameter of ctor).
For later Lucene versions we will have a new jflex file (currently unicode 4.0) and that produces the same matrix as java 1.5 in jflex 1.4 (at the moment).
By that we simply made the parser regeneration invariant to the developer's JVM. About nothing more is this issue at the moment.

Do you see a problem with just requiring Flex 1.5 for Lucene trunk at the moment?

I think it's fine to do that.

The new parsers (see patch) are pre-generated in SVN, so somebody compiling lucene from source does need to use jflex. And the parsers for StandardTokenizer are verified to work correct and are even identical (DFA wise) for the old Java 1.4 / Unicode 3.0 case.

Most of the StandardTokenizerImpl.jflex grammar is expressed in absolute terms - the only JVM-/Unicode-version-sensistive usages are [:letter:] and [:digit:], which under JFlex <1.5 were expanded using the scanner-generation-time JVM's Character.isLetter() and .isDigit() definitions, but under JFlex 1.5-SNAPSHOT depend on the declared Unicode version definitions (i.e., [:letter:] = \p

{Letter}

).

I'm actually surprised that the DFAs are identical, since I'm almost certain that the set of characters matching [:letter:] changed between Unicode 3.0 and Unicode 4.0 (maybe [:digit:] too). I'll take a look this weekend.

Steve Rowe
added a comment - 03/Dec/09 17:47 Do you see a problem with just requiring Flex 1.5 for Lucene trunk at the moment?
I think it's fine to do that.
The new parsers (see patch) are pre-generated in SVN, so somebody compiling lucene from source does need to use jflex. And the parsers for StandardTokenizer are verified to work correct and are even identical (DFA wise) for the old Java 1.4 / Unicode 3.0 case.
Most of the StandardTokenizerImpl.jflex grammar is expressed in absolute terms - the only JVM-/Unicode-version-sensistive usages are [:letter:] and [:digit:] , which under JFlex <1.5 were expanded using the scanner-generation-time JVM's Character.isLetter() and .isDigit() definitions, but under JFlex 1.5-SNAPSHOT depend on the declared Unicode version definitions (i.e., [:letter:] = \p
{Letter}
).
I'm actually surprised that the DFAs are identical, since I'm almost certain that the set of characters matching [:letter:] changed between Unicode 3.0 and Unicode 4.0 (maybe [:digit:] too). I'll take a look this weekend.

Do you see a problem with just requiring Flex 1.5 for Lucene trunk at the moment? It would hep us to not require a Java 1.4 JRE just to convert the jflex files.

The new parsers (see patch) are pre-generated in SVN, so somebody compiling lucene from source does need to use jflex. And the parsers for StandardTokenizer are verified to work correct and are even identical (DFA wise) for the old Java 1.4 / Unicode 3.0 case.

Uwe Schindler
added a comment - 03/Dec/09 14:58 Thanks Steve.
Do you see a problem with just requiring Flex 1.5 for Lucene trunk at the moment? It would hep us to not require a Java 1.4 JRE just to convert the jflex files.
The new parsers (see patch) are pre-generated in SVN, so somebody compiling lucene from source does need to use jflex. And the parsers for StandardTokenizer are verified to work correct and are even identical (DFA wise) for the old Java 1.4 / Unicode 3.0 case.

Steve Rowe
added a comment - 03/Dec/09 14:40 Will the old jflex fail on %unicode {x.y} syntax ???
I haven't tested it, but JFlex <1.5 likely will fail on this syntax, since nothing is expected after the %unicode directive.
Hopefully JFlex 1.5 comes out until we release 3.1, I would be happy.
I think the JFlex 1.5 release will happen before March of next year, since we're down to just a few blocking issues.

This patch now implements my latest proposal about the filenames. To easy see, what changed in the TokenizerImpls, the patch cannot be applied before doing some copy/rename before.

Do the following:

svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.*

svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.*

After that you have two copies of the original Tokenizer Impls. After that apply the patch. The patch clearly shows, that even after regeneration with Java 1.5, the original version using Java 1.4 (Unicode 3) is equal to before (esp. the DFA matrix). The 31-version is different (other matrix).

If we later create new versions, we can call them 32 etc.

This patch solves the JFlex 1.4 problem with needing the explicit java version. It currently requires the trunk version of JFlex, which would be no problem for this parsers (as verified, that they produce the same DFA & code for 1.4). So other speak up, Steven Rowe? What do you think. Only developers need the trunk version at the moment as the generated files are in the checkout.

Hopefully JFlex 1.5 comes out until we release 3.1, I would be happy. In later issues we can optimize the newly added 31 version with more unicode features, the Orig version stays as it is. We could also remove the special cases in the latest version like replaceInvlaidAcronym and so on, as this only applies for Version.LUCENE_2x.

Uwe Schindler
added a comment - 02/Dec/09 22:47 This patch now implements my latest proposal about the filenames. To easy see, what changed in the TokenizerImpls, the patch cannot be applied before doing some copy/rename before.
Do the following:
svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.*
svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.*
After that you have two copies of the original Tokenizer Impls. After that apply the patch. The patch clearly shows, that even after regeneration with Java 1.5, the original version using Java 1.4 (Unicode 3) is equal to before (esp. the DFA matrix). The 31-version is different (other matrix).
If we later create new versions, we can call them 32 etc.
This patch solves the JFlex 1.4 problem with needing the explicit java version. It currently requires the trunk version of JFlex, which would be no problem for this parsers (as verified, that they produce the same DFA & code for 1.4). So other speak up, Steven Rowe? What do you think. Only developers need the trunk version at the moment as the generated files are in the checkout.
Hopefully JFlex 1.5 comes out until we release 3.1, I would be happy. In later issues we can optimize the newly added 31 version with more unicode features, the Orig version stays as it is. We could also remove the special cases in the latest version like replaceInvlaidAcronym and so on, as this only applies for Version.LUCENE_2x.

Robert Muir
added a comment - 02/Dec/09 17:22 By this whenever we create a new version/syntax, we can name it exactly like the first version it is supported. Will commit a patch later.
commit a patch? or upload a patch? I still think we should wait for official Jflex release, unless a jflex developer comments on this issue otherwise

Uwe Schindler
added a comment - 02/Dec/09 17:20 - edited I will try out if it fails. But you cannot even regenerate, because the package name of Flex changed in 1.5 (because of that I had to change build.xml).
One thing: I will rename the generated classes:
StandardTokenizerImplOrig (the initial one without Version)
StandardTokenizerImpl31 for the new one
... newer ones come after that
By this whenever we create a new version/syntax, we can name it exactly like the first version it is supported. Will upload (not commit g ) a patch later.

To regenerate the parsers you can run "ant jflex", but the sysprop jflex.home has to point to the JFlex trunk checkout, where mvn install has run before (I changed build.xml and common-build.xml to work correctly).

I added a test that tests the tokenization in Java 1.4 (Version.LUCENE_30) and Java 1.5 mode (Version.LUCENE_CURRENT). There are two JFlex files, one that is Unicode 3.0 (Java 1.4.1) compatible (and even when run in JDK 5, it produces now an Java 1.4 compatible parser!) and one with unicode version 4.0 (Java 5).

Uwe Schindler
added a comment - 02/Dec/09 16:56 Attached is a patch.
To regenerate the parsers you can run "ant jflex", but the sysprop jflex.home has to point to the JFlex trunk checkout, where mvn install has run before (I changed build.xml and common-build.xml to work correctly).
I added a test that tests the tokenization in Java 1.4 (Version.LUCENE_30) and Java 1.5 mode (Version.LUCENE_CURRENT). There are two JFlex files, one that is Unicode 3.0 (Java 1.4.1) compatible (and even when run in JDK 5, it produces now an Java 1.4 compatible parser!) and one with unicode version 4.0 (Java 5).

Robert Muir
added a comment - 02/Dec/09 15:28 Uwe I agree, I am just answering your questions about "only interesting new feature"...
i don't think we shoudl do anything here (except experiment) until we have an official jflex release either...

Accoring to the title of this issue, we should regenerate two parsers with %unicode 3.1 for the legacy !onOrAfter(LUCENE_31) parser and one for Java 5, which is %unicode 4.0. Then we can close the issue and work with later improvements as you suggested (which may be language dependent, what this tokenizer should not be).

Uwe Schindler
added a comment - 02/Dec/09 15:22 Accoring to the title of this issue, we should regenerate two parsers with %unicode 3.1 for the legacy !onOrAfter(LUCENE_31) parser and one for Java 5, which is %unicode 4.0. Then we can close the issue and work with later improvements as you suggested (which may be language dependent, what this tokenizer should not be).

So I should also be able to regenerate the old grammar with %Unicode 3.1 (that was the version of JDK 1.4?) - that would be fine, because I could simply provide another grammar file to regenerate all parsers!

Uwe Schindler
added a comment - 02/Dec/09 15:17 So I should also be able to regenerate the old grammar with %Unicode 3.1 (that was the version of JDK 1.4?) - that would be fine, because I could simply provide another grammar file to regenerate all parsers!
That would be helpful as a first step for 3.1!

Uwe, thats not the only interesting new feature. The other is it supports the properties necessary to actually tokenize text according to unicode standards, instead of words just being runs of IsLetter=True.

Robert Muir
added a comment - 02/Dec/09 15:11 Uwe, thats not the only interesting new feature. The other is it supports the properties necessary to actually tokenize text according to unicode standards, instead of words just being runs of IsLetter=True.
You can see such a grammar here: http://jflex.svn.sourceforge.net/viewvc/jflex/trunk/testsuite/testcases/src/test/cases/unicode-word-break/UnicodeWordBreakRules_5_2.flex?revision=578&view=markup
you should try %unicode 5.2 for now, but we shouldn't wimp out with that for 3.1.

That's obvious. But do we want to do this? I thought the only interesting "new feature" was that it uses an internal unicode impl, that is not JVM specific´, so the JVM of the jflex compilation is not relevant. This issue only waits for this, doesn't it?

So I think we only have to change the grammar to specify a specific unicode version in the header?

Uwe Schindler
added a comment - 02/Dec/09 15:03 That's obvious. But do we want to do this? I thought the only interesting "new feature" was that it uses an internal unicode impl, that is not JVM specific´, so the JVM of the jflex compilation is not relevant. This issue only waits for this, doesn't it?
So I think we only have to change the grammar to specify a specific unicode version in the header?

I tried with JFlex 1.5 svn trunk version from today, built with maven. The produced java file is almost identical, the big array is 100% identical. There are only some new private methods and one switch was reordered. The interface works for both versions, so nothing to change.

I am not sure, which unicode version this JFlex now uses and if Steven's changes are in it. I cannot test with Java 1.4, as this version requires 1.5.

I attached a patch with the difference between the JFlex-1.4.1 on Java 5 generated java file and the snapshot version (also Java 5). You can see that the big DFA array was not changed at all.

Uwe Schindler
added a comment - 02/Dec/09 12:02 - edited I tried with JFlex 1.5 svn trunk version from today, built with maven. The produced java file is almost identical, the big array is 100% identical. There are only some new private methods and one switch was reordered. The interface works for both versions, so nothing to change.
I am not sure, which unicode version this JFlex now uses and if Steven's changes are in it. I cannot test with Java 1.4, as this version requires 1.5.
I attached a patch with the difference between the JFlex-1.4.1 on Java 5 generated java file and the snapshot version (also Java 5). You can see that the big DFA array was not changed at all.

Should we wait for Jflex 1.5? I do not know how close it is. Steven Rowe will know.
But I think the unicode branch he was working has been merged to their trunk so we could check to make sure your interface will work for it, at the least.

Robert Muir
added a comment - 02/Dec/09 08:59 I think we can go forward with this now?
Should we wait for Jflex 1.5? I do not know how close it is. Steven Rowe will know.
But I think the unicode branch he was working has been merged to their trunk so we could check to make sure your interface will work for it, at the least.

Uwe Schindler
added a comment - 02/Dec/09 08:54 I think we can go forward with this now?
I would like to also add a check in the build.xml, that tests before running jflex, that the Java version is as exspected.

Do we agree, that this patch should wait for 3.1, as the JFlex parser is already ok and backwards compatible in 3.0, so no need to do anything? In 3.1 together with the other unicode changes, we will update StandardTokenizer with Version.LUCENE_31?

Uwe Schindler
added a comment - 17/Nov/09 10:06 Do we agree, that this patch should wait for 3.1, as the JFlex parser is already ok and backwards compatible in 3.0, so no need to do anything? In 3.1 together with the other unicode changes, we will update StandardTokenizer with Version.LUCENE_31?

Uwe, we could fix in 3.1 (but we should commit the warning no matter what I think!)

If we commit for 3.0, then its still not really correct for Unicode 4.
in my opinion, better would be to wait for 3.1 and use this interface you built, along with a new version of Jflex for much better unicode support?

Robert Muir
added a comment - 16/Nov/09 22:45 Uwe, we could fix in 3.1 (but we should commit the warning no matter what I think!)
If we commit for 3.0, then its still not really correct for Unicode 4.
in my opinion, better would be to wait for 3.1 and use this interface you built, along with a new version of Jflex for much better unicode support?

Should we fix this for 3.0 or not?
The current JFlex file in trunk/lucene_30 is generated by Java 1.4 (I verified), so it does not break. So we could wait for 3.1 and provide there a new StandardTokenizer with unicode 5 support

Uwe Schindler
added a comment - 16/Nov/09 22:41 Should we fix this for 3.0 or not?
The current JFlex file in trunk/lucene_30 is generated by Java 1.4 (I verified), so it does not break. So we could wait for 3.1 and provide there a new StandardTokenizer with unicode 5 support

It uses hardcode char ranges, the parser is therefore not JDK-dependent. Let's keep it as it is for now. Mediawiki itsself is not unicode conform, because it's written in PHP and PHP only gets unicode in 6.0 (lol) (that says a PHP core committer named U.S. from Bremen in Germany...)

Uwe Schindler
added a comment - 16/Nov/09 22:17 - edited It uses hardcode char ranges, the parser is therefore not JDK-dependent. Let's keep it as it is for now. Mediawiki itsself is not unicode conform, because it's written in PHP and PHP only gets unicode in 6.0 ( lol ) (that says a PHP core committer named U.S. from Bremen in Germany...)

Robert Muir
added a comment - 16/Nov/09 22:05 well, the wikipediatokenizer at least is similar to standardtokenizer, except it does not use unicodeproperties, instead hardcoded ranges.
so the behavior won't change from 2.9.x, but it wont be unicode 4 either, don't know if we should worry about this?

I think this is simply a sign of 1) Lucene's maturity, and 2) that we
take back compat seriously. I actually think we don't yet use it
enough...

EG, LUCENE-1255 was one nasty bug, that we at first fixed, but then
rolled back, because of the back-compat break. Then it was
rediscovered and opened again, as LUCENE-1542, when we decided it was
nasty enough to just fix it and put an entry in CHANGES that you
hopefully will read.

But it really is a back-compat break, in that apps could quite easily
be relying on the buggy behavior. I think that bug would have been a
good reason to add Version to IW.

Fixing invalid acronyms in StandardAnalyzer, but then leaving it
broken by default, was the original "inspiration" for Version. We
shouldn't every fix a bug, but then be forced to leave the bug in
place due to back compat.

Version lets us fix bugs, change defaults for the better, etc., w/o
compromising on our back compat policy. It's an impoprtant
tool...

The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.

Right, I think Version will especially find its way into changes that
alter what's indexed (analyzers, bugs like LUCENE-1255, etc.).

Michael McCandless
added a comment - 16/Nov/09 22:02 I feel bad about this whole Version Enum
I think this is simply a sign of 1) Lucene's maturity, and 2) that we
take back compat seriously. I actually think we don't yet use it
enough...
EG, LUCENE-1255 was one nasty bug, that we at first fixed, but then
rolled back, because of the back-compat break. Then it was
rediscovered and opened again, as LUCENE-1542 , when we decided it was
nasty enough to just fix it and put an entry in CHANGES that you
hopefully will read.
But it really is a back-compat break, in that apps could quite easily
be relying on the buggy behavior. I think that bug would have been a
good reason to add Version to IW.
Fixing invalid acronyms in StandardAnalyzer, but then leaving it
broken by default, was the original "inspiration" for Version. We
shouldn't every fix a bug, but then be forced to leave the bug in
place due to back compat.
Version lets us fix bugs, change defaults for the better, etc., w/o
compromising on our back compat policy. It's an impoprtant
tool...
The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.
Right, I think Version will especially find its way into changes that
alter what's indexed (analyzers, bugs like LUCENE-1255 , etc.).

Uwe Schindler
added a comment - 16/Nov/09 21:59 - edited Uwe, also, just checking, i don't know javacc at all, does it use unicode properties? We have a lot of queryparsers out there...
I do not know it, too
The only query parser using jflex is the new one. And the new one should normally use no unicode properties. Can you check the JFlex file? All other query parsers use JavaCC
edit
No Jflex is used by any query parser. But WikipediaTokenizer uses JFlex...

The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.

Afaik in the contrib/analyzis package this is only used because of STDAnalyzer and StopFilter. It seems like a kind of an overkill. But again I should on do any "issue - high-jacking" here. The thing is even if you come up with a better solution this will most likely stay forever - just like all the IFDEF stuff in Boost

Simon Willnauer
added a comment - 16/Nov/09 21:58 The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.
Afaik in the contrib/analyzis package this is only used because of STDAnalyzer and StopFilter. It seems like a kind of an overkill. But again I should on do any "issue - high-jacking" here. The thing is even if you come up with a better solution this will most likely stay forever - just like all the IFDEF stuff in Boost

We should really try hard to find different solutions than spreading Version all over the place.
I know this is a hard problem but I want to make sure that we do not spread is out in every corner of the code. The Version thing is already annoying enough in Contrib/analyzer.

The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.

Mark Miller
added a comment - 16/Nov/09 21:48 We should really try hard to find different solutions than spreading Version all over the place.
I know this is a hard problem but I want to make sure that we do not spread is out in every corner of the code. The Version thing is already annoying enough in Contrib/analyzer.
The problem is, these are the hard backwards compat situations that it was created for - the whole analyzer package was/is bound to have lots of Version stuff.

This is just the latest issue I found related to that stuff. The version member was there already but the conditional was introduced. Don't get me wrong I just wanna make sure we are not using it as a general purpose conditional! This is going to be a nightmare otherwise. I would only use it if there is NO other way at all.

Simon Willnauer
added a comment - 16/Nov/09 21:45 For this one it's not new, it was there before my patch
This is just the latest issue I found related to that stuff. The version member was there already but the conditional was introduced. Don't get me wrong I just wanna make sure we are not using it as a general purpose conditional! This is going to be a nightmare otherwise. I would only use it if there is NO other way at all.
simon

Robert Muir
added a comment - 16/Nov/09 21:45 I am anti-Version too in a lot of ways. I worry it will spread everywhere and make things a mess, and maybe we can come up with more creative solutions to get rid of it in the future.
But I think Uwe's patch is ok, Version was actually created with StandardTokenizer in mind I believe...

This might be the wrong place to mention it but I feel bad about this whole Version Enum. It became a pest spread all over the code. Lucene Code begins to look like the C++ Boost library where you see more precompiler statements than template code. We should really try hard to find different solutions than spreading Version all over the place.
I know this is a hard problem but I want to make sure that we do not spread is out in every corner of the code. The Version thing is already annoying enough in Contrib/analyzer.

Simon Willnauer
added a comment - 16/Nov/09 21:40 This might be the wrong place to mention it but I feel bad about this whole Version Enum. It became a pest spread all over the code. Lucene Code begins to look like the C++ Boost library where you see more precompiler statements than template code. We should really try hard to find different solutions than spreading Version all over the place.
I know this is a hard problem but I want to make sure that we do not spread is out in every corner of the code. The Version thing is already annoying enough in Contrib/analyzer.

Here the patch. It uses an interface containing the needed methods to easyliy switch between both impl. The old one was deprecated and stays there (little modified) without any jflex (which is no longer needed). If we want to change the syntax, we must generate new JFlex file and reenerate a new parser from it (using separate class name)

Uwe Schindler
added a comment - 16/Nov/09 21:32 Here the patch. It uses an interface containing the needed methods to easyliy switch between both impl. The old one was deprecated and stays there (little modified) without any jflex (which is no longer needed). If we want to change the syntax, we must generate new JFlex file and reenerate a new parser from it (using separate class name)