Dawid Weiss
added a comment - 23/Feb/12 02:05 A patch with reimplementation of getReplaceBlock and a test case that is failing with AIOOB (apply test changes without modifying PatternReplaceCharFilter to get the error).

Robert Muir
added a comment - 23/Feb/12 02:34 Here's a simple random test showing some existing bugs in the filter...
there are offsets problems as dawid notices...
blockbuffer should always oversize by 1 character, if a block ends on a high surrogate (rare) it should do one additional read() so it doesnt create invalid unicode

to fix dawid's problem we can probably modify this test only for ascii, i suspect the unicode "problems"
are going to be impossible to fix given java's regex library (i think it does not treat "." as codepoint
but code unit). I'll take another stab at that just to tackle the offsets issue he is seeing.

Robert Muir
added a comment - 23/Feb/12 02:41 to fix dawid's problem we can probably modify this test only for ascii, i suspect the unicode "problems"
are going to be impossible to fix given java's regex library (i think it does not treat "." as codepoint
but code unit). I'll take another stab at that just to tackle the offsets issue he is seeing.

Robert Muir
added a comment - 23/Feb/12 03:01 updated patch, this tests only ascii (to avoid stupid problems in outdated regex support).
But there are a lot of offset problems (perhaps this corresponds to the warning in the class's javadocs?), including things like offsets being corrected to negative numbers...

Thanks for looking at this, Robert. I'll go back to this later today, but I can tell you right now that from my paper considerations negative indexes make logical sense in case of "prepended" characters. So:

PATTERN: A
INPUT: ABCDEF
REPLACEMENT: XYZ
OUTPUT:XYZBCDEF

then (in my patch) X and Y would have negative offsets. It's a matter of agreement I guess. Negative indexes are consistent with something like this:

PATTERN: ^
INPUT: ABC
REPLACEMENT: XYZ
OUTPUT:XYZABC

then all three characters (XYZ) have a negative index to indicate they're prepended. Thoughts?

Dawid Weiss
added a comment - 23/Feb/12 07:21 Thanks for looking at this, Robert. I'll go back to this later today, but I can tell you right now that from my paper considerations negative indexes make logical sense in case of "prepended" characters. So:
PATTERN: A
INPUT: ABCDEF
REPLACEMENT: XYZ
OUTPUT:XYZBCDEF
then (in my patch) X and Y would have negative offsets. It's a matter of agreement I guess. Negative indexes are consistent with something like this:
PATTERN: ^
INPUT: ABC
REPLACEMENT: XYZ
OUTPUT:XYZABC
then all three characters (XYZ) have a negative index to indicate they're prepended. Thoughts?

Dawid Weiss
added a comment - 23/Feb/12 09:28 Unrelated to this change: we should be using StringBuilder here.
if you mean the pattern matching bit then Matcher simply doesn't accept a StringBuilder...

I've made a deliberate decision to deprecate and not use block delimiters and block processing. If you think this is a backwards no-no then feel free to correct this patch... I think block processing may be worth dropping given the code clarity without it.

Dawid Weiss
added a comment - 23/Feb/12 09:43 A simplifying patch that includes Robert's random tests (and passes).
I've made a deliberate decision to deprecate and not use block delimiters and block processing. If you think this is a backwards no-no then feel free to correct this patch... I think block processing may be worth dropping given the code clarity without it.

Robert Muir
added a comment - 23/Feb/12 11:22
I'll go back to this later today, but I can tell you right now that from my paper considerations negative indexes make logical sense
We cannot do this... this is the offset (character position in the reader).
Offsets can never be negative.

I don't have an opinion / don't know the use case behind block delimiters, but this is a
small, clean, elegant implementation that doesnt cause offsets problems (from my futzing
around there seemed to be at least 2 bugs, one at the beginning and one at the end?)

Robert Muir
added a comment - 23/Feb/12 12:55 OK, after reviewing in detail...
+1 I like the cleanup here!
I don't have an opinion / don't know the use case behind block delimiters, but this is a
small, clean, elegant implementation that doesnt cause offsets problems (from my futzing
around there seemed to be at least 2 bugs, one at the beginning and one at the end?)

I'm seeing TestPatternReplaceCharFilter.testRandomData sometimes take a REALLY long time (longer than I can wait ).

The test is not hung (CPU is pegged).

I don't have a seed (can we somehow tap into kill -QUIT so that in addition to the JRE printing stack traces for all threads, it also prints our seed?)... but here's the stack trace:

"main" prio=10 tid=0x00007fbc64008000 nid=0x4095 runnable [0x00007fbc69b32000]
java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$Loop.match(Pattern.java:4304)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4311)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4251)
at java.util.regex.Pattern$Start.match(Pattern.java:3055)
at java.util.regex.Matcher.search(Matcher.java:1105)
at java.util.regex.Matcher.find(Matcher.java:535)
at org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.processPattern(PatternReplaceCharFilter.java:100)
at org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.read(PatternReplaceCharFilter.java:80)
at java.io.Reader.read(Reader.java:104)
at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:138)
at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:105)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:346)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:304)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:259)
at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:297)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:707)
at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:606)
at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511)
at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:569)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:24)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98)
at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53)
at org.junit.runner.JUnitCore.main(JUnitCore.java:45)

I guess the random regexp (_TestUtil.randomRegexpishString(random)) is sometimes too evil...?

Michael McCandless
added a comment - 27/Feb/12 19:43 I'm seeing TestPatternReplaceCharFilter.testRandomData sometimes take a REALLY long time (longer than I can wait ).
The test is not hung (CPU is pegged).
I don't have a seed (can we somehow tap into kill -QUIT so that in addition to the JRE printing stack traces for all threads, it also prints our seed?)... but here's the stack trace:
"main" prio=10 tid=0x00007fbc64008000 nid=0x4095 runnable [0x00007fbc69b32000]
java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$Loop.match(Pattern.java:4304)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4311)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4251)
at java.util.regex.Pattern$Start.match(Pattern.java:3055)
at java.util.regex.Matcher.search(Matcher.java:1105)
at java.util.regex.Matcher.find(Matcher.java:535)
at org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.processPattern(PatternReplaceCharFilter.java:100)
at org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.read(PatternReplaceCharFilter.java:80)
at java.io.Reader.read(Reader.java:104)
at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:138)
at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:105)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:346)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:304)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:259)
at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(TestPatternReplaceCharFilter.java:297)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:707)
at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:606)
at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511)
at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:569)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:24)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98)
at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53)
at org.junit.runner.JUnitCore.main(JUnitCore.java:45)
I guess the random regexp ( _TestUtil.randomRegexpishString(random) ) is sometimes too evil...?

That's Robert's evil piece of random pattern generation... I told him this would stress the pattern engine more than our code and this seems to be the case The above seems like a heavy backtracking/ recursion pattern, something Russ Cox was writing about (http://swtch.com/~rsc/regexp/regexp1.html).

I'd just add @Ignore to this test? Alternatively, I'd make it absolute-time bound and simply terminate after, say, 3 seconds.

Dawid Weiss
added a comment - 27/Feb/12 20:03 That's Robert's evil piece of random pattern generation... I told him this would stress the pattern engine more than our code and this seems to be the case The above seems like a heavy backtracking/ recursion pattern, something Russ Cox was writing about ( http://swtch.com/~rsc/regexp/regexp1.html ).
I'd just add @Ignore to this test? Alternatively, I'd make it absolute-time bound and simply terminate after, say, 3 seconds.

Dawid Weiss
added a comment - 27/Feb/12 20:51 I've added a time limit of 2 seconds. Seems to do the trick for me.
can we somehow tap into kill -QUIT so that in addition to the JRE printing stack traces for all threads, it also prints our seed?)
Yes, put the seed in your test running thread's name.

Michael McCandless
added a comment - 28/Feb/12 00:07 Yes, put the seed in your test running thread's name.
Hmm maybe LTC can somehow do this for us...? EG maybe we make a utility method to launch a new thread... and it sets the name?
Mike, resolve if this fix works for you.
Well... I beasted w/ this change, but after 156 iterations it got a bad regexp again.
Would be ashame to just @Ignore it though... can we somehow make the regexp generation less "evil"?

I've committed that thread-name-contains-seed thing. I've also tried to reproduce the long pattern but it's been running on my machine for a few minutes in a tight loop and all of them end below one second. Can you try to reproduce it again? I'm curious what's causing this. I'll add @Ignore once we know what the problem is.

Dawid Weiss
added a comment - 28/Feb/12 08:14 I've committed that thread-name-contains-seed thing. I've also tried to reproduce the long pattern but it's been running on my machine for a few minutes in a tight loop and all of them end below one second. Can you try to reproduce it again? I'm curious what's causing this. I'll add @Ignore once we know what the problem is.

Yep, nice catch. That test case causes a beautiful exponential time pattern to be generated (I've added it as an @Ignored test . I limited the input size to 40 characters. With such input it should be possible to traverse the entire search space, even if it's exponential. I don't see a way to easily verify if a pattern is exponential or not (without resigning from certain types of patterns).

Dawid Weiss
added a comment - 28/Feb/12 19:27 Yep, nice catch. That test case causes a beautiful exponential time pattern to be generated (I've added it as an @Ignored test . I limited the input size to 40 characters. With such input it should be possible to traverse the entire search space, even if it's exponential. I don't see a way to easily verify if a pattern is exponential or not (without resigning from certain types of patterns).