RandomizedRunner is a JUnit runner, so it is capable of running @Test-annotated test cases. It
respects regular lifecycle hooks such as @Before, @After, @BeforeClass or @AfterClass, but it
also adds the following:

Randomized, but repeatable execution and infrastructure for dealing with randomness:

uses pseudo-randomness (so that a given run can be repeated if given the same starting seed)
for many things called "random" below,

randomly shuffles test methods to ensure they don't depend on each other,

randomly shuffles hooks (within a given class) to ensure they don't depend on each other,

base class RandomizedTest provides a number of methods for generating random numbers, strings
and picking random objects from collections (again, this is fully repeatable given the
initial seed if there are no race conditions),

the runner provides infrastructure to augment stack traces with information about the initial
seeds used for running the test, so that it can be repeated (or it can be determined that
the test is not repeatable – this indicates a problem with the test case itself).

Thread control:

any threads created as part of a test case are assigned the same initial random seed
(repeatability),

tracks and attempts to terminate any Threads that are created and not terminated inside
a test case (not cleaning up causes a test failure),

tracks and attempts to terminate test cases that run for too long (default timeout: 60 seconds,
adjustable using global property or annotations),

RandomizedRunner does not "chain" or "suppress" exceptions happening during execution of
a test case (including hooks). All exceptions are reported as soon as they happened and multiple
failure reports can occur. Most environments we know of then display these failures sequentially
allowing a clearer understanding of what actually happened first.

Activity

I consider this issue done. I've pulled all the super-cool features from Lucene/Solr and put them into a separate, stand-alone and reusable project called randomizedtesting. We have switched our infrastructure at Carrot2 to use it and I'm very happy with it.

Dawid Weiss
added a comment - 14/Feb/12 15:32 I consider this issue done. I've pulled all the super-cool features from Lucene/Solr and put them into a separate, stand-alone and reusable project called randomizedtesting. We have switched our infrastructure at Carrot2 to use it and I'm very happy with it.
http://labs.carrotsearch.com/randomizedtesting.html
I will file another issue concerned with progressively moving from LuceneTestRunner/Case and tests infrastructure to RandomizedRunner (and siblings).

In response to my question whether the idea of randomized testing is new Yuriy Pasichnyk passed me the info about Haskell's QuickCheck project. Indeed, the idea is pretty much the same (with differences concerning implementation details, not the concept itself).

There is a Java port of this too, if you check out Wikipedia. The implementation follows a different direction compared to what I implemented, but there are also pieces that are nearly 1:1 identical copies. Good to know – this means I wasn't completely wrong in my goals.

Dawid Weiss
added a comment - 21/Oct/11 20:45 In response to my question whether the idea of randomized testing is new Yuriy Pasichnyk passed me the info about Haskell's QuickCheck project. Indeed, the idea is pretty much the same (with differences concerning implementation details, not the concept itself).
http://en.wikipedia.org/wiki/QuickCheck
There is a Java port of this too, if you check out Wikipedia. The implementation follows a different direction compared to what I implemented, but there are also pieces that are nearly 1:1 identical copies. Good to know – this means I wasn't completely wrong in my goals.

Yeah, I was thinking about that too and I've actually started, replaced the runner successfully but then there is no simple "piece by piece" with all the static method calls tangled together... And RandomizedRunner requires a different seed format etc... I'll give it another shot though.

Dawid Weiss
added a comment - 11/Oct/11 14:59 Yeah, I was thinking about that too and I've actually started, replaced the runner successfully but then there is no simple "piece by piece" with all the static method calls tangled together... And RandomizedRunner requires a different seed format etc... I'll give it another shot though.

I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.

Could we add this stuff, cut over our runner to it (our runner doesnt actually do that match?) and then migrate our base class functionality piece by piece to the runner code (nuking it from LuceneTestCase?)

Robert Muir
added a comment - 11/Oct/11 14:24
I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.
Could we add this stuff, cut over our runner to it (our runner doesnt actually do that match?) and then migrate our base class functionality piece by piece to the runner code (nuking it from LuceneTestCase?)

A word of warning: this will be a longer comment. I still hope somebody will read it

I've written a somewhat largish chunk of code that provides an infrastructure to run "randomized", but "repeatable" tests. I'd like to report on my impressions so far.

Robert was right that a custom runner provides more flexibility than a @Rule on top of the default JUnit runner (which changes depending where you run it – ant, maven, Eclipse, etc.). I've spent a lot of time inspecting the current implementation inside JUnit and I came to the conclusion that it really is best to have a full reimplementation of the Runner interface. Full meaning not descending ParentRunner, but implementing the whole runner from scratch. This provides additional, uhm, unexpected benefits in that one can add new functionality that "regular" JUnit runners don't have and still be compatible with hosting environments such as Ant, Maven or Eclipse (because they, thank God, respect @RunWith).

Among the things I have implemented so far that are missing or different in JUnit are:

There is a "context" object which is accessible via thread local, so @BeforeClass and other suite-level hooks can actually access the suite class, inspect it, check conditions, whatever (the runner's random seed is also passed via this context). This is useful, but not crucial.

I've decided to deviate from JUnit strict policy of having public hook methods. By default this only causes headaches when one shadows or overrides a hook in the parent class and it is no longer invoked. A better (different) idea is to declare hooks as private; no shadowing occurs and they will all get invoked in a contractual predefined order (befores - super to class, afters - class to super).

I've added additional suite-level annotations. @Listeners provides listeners automatically hooked to RunListener. @Validators hooks up additional validators for verifying extra restrictions. An example of such a restriction is bailing out the test suite if shadowed or overridden methods exist in the class hierarchy of a suite class. Another (that I have implemented) is a validator checking for non-annotated testXXX methods that are dead JUnit3 test cases. You get the idea. A lot of code then simply vanishes from LTC; I can envision it having this shape:

Some of these things are currently verified using a state machine (calling super() in overridden methods), but this just looks better to me to take away this concern elsewhere rather than implement it inside LTC.

The entire lifecycle of handling test method calls and hooks is controlled in the runner. I made a design decision to not follow JUnit's insane wrap-wrap-wrap-exception style but instead report all exceptions that happen anywhere in the lifecycle. So if you get an exception in the test case, followed by an exception in @After, followed by an exception in @AfterClass, all these exceptions will be reported separately to the RunListener and in effect to all listening objects (in the lifecycle-corresponding order!). Such an implementation does work with fine with ANT JUnit reports, maven reports and in Eclipse (all exceptions are included) so far as I can tell – didn't check other environments like NetBeans or IntelliJ. Again: in my personal opinion this is a much clearer way of dealing with exceptions in the lifecycle of JUnit test case compared to wrapping them into artificial exceptions (MultipleException being a supreme example) or suppressing them altogether.

I couldn't resist a tiny tweak of making any exceptions thrown from hooks or test methods carry the information about the seed used in their execution (both runner-level and method-level, even though the latter could be derived from the former). There is no easy way to do it because Throwables are designed not to allow changes to their content once constructed. With the exception of stack traces So I simply inject a debugging info inside the stack trace as an artificial entry; what it looks like is here, for instance:

(Note how the seed info is inside the file position of StackTraceEntry object.). This may seem like overly clever solution, but I've had it many times that sysouts got discarded or lost somehow and an exception object along with the stack trace is always there in front of your eyes. Another way to capture-and-dump reproduction info is to use @Listeners annotation above; this can be used for much what LTC does today – -D…, -D…, -D...

A custom runner can have custom implementation of the contractual "events", such as assumptions or ignore triggers. This takes away a lot of code related to trying to get around JUnit's API limitations (assume without message/cause, method filtering and dynamic ignores based on extra conditions like @Nightly, etc.).

In short: I'm really happy with a custom Runner.

As for the infrastructure for writing randomized test cases:

There is currently one "master" seed that the runner either generates randomly or accepts as a global constant. Everything else: method shuffling, initial random instance for each test case (method repetition)… really everything is based on sequential calls to this generator. This has advantages and disadvantages I guess (read about static initializers below), but it was my personal desire to implement it this way and based on my few days' worth of experience with this code, it works great.

I've written a base class RandomizedTest that extends Assert and has a number of utility methods for picking random numbers or objects from collections. There is no passing of explicit Random instances around like it is done currently in LTC though. The base class accesses the context's Random (which it is assigned by the runner) and then uses this random consistently to generate pseudo-randomness in selection of attributes and iterations. Of course once you go multi-threaded this will all go to dust, but I imagine multi-threaded tests shouldn't use the base class's randomness (a test case based on race conditions won't be repeatable anyway). If anything, generate per-thread Randoms based on current seed and let each thread handle its own sequence of pseudo-random numbers from there. This is even possible at runtime with non-mock objects as I'm going to show in Barcelona, hopefully.

Now… if you're still with me you're probably interested how this applies to Lucene. The wall I've hit is the sheer amount of code that any change to LTC affects. I realized it'd be large, but it's just gargantuan

The major issue is with static initializers and static public methods called from them that leave resources behind. I'm sorry, but nobody can convince me this isn't evil. I understand certain things are costly and require a one-time setup, but these should really be moved to @BeforeClass fixture hooks. If one really needs to do things once at JVM lifespan level a @BeforeClass with some logic to perform a single initialization can be a replacement for a static initializer (even if it's unclear to me when exactly such a fixture would be really needed). In short: the problem with static initializers is that they are executed outside the lifecycle control of the runner… I'd say most of the problems and current patchy solutions inside LTC (dealing with resource tracking for example) are somehow related to the fact that static initializers and static method calls are used throughout the codebase.

I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.

The runner (alone) is currently at github if you care to take a look. I think Barcelona may be a good place to talk about this face to face and decide what to do with it. I'm myself leaning towards the: have parallel base classes and port existing tests in chunks.

Dawid Weiss
added a comment - 11/Oct/11 14:05 A word of warning: this will be a longer comment. I still hope somebody will read it
I've written a somewhat largish chunk of code that provides an infrastructure to run "randomized", but "repeatable" tests. I'd like to report on my impressions so far.
Robert was right that a custom runner provides more flexibility than a @Rule on top of the default JUnit runner (which changes depending where you run it – ant, maven, Eclipse, etc.). I've spent a lot of time inspecting the current implementation inside JUnit and I came to the conclusion that it really is best to have a full reimplementation of the Runner interface. Full meaning not descending ParentRunner, but implementing the whole runner from scratch. This provides additional, uhm, unexpected benefits in that one can add new functionality that "regular" JUnit runners don't have and still be compatible with hosting environments such as Ant, Maven or Eclipse (because they, thank God, respect @RunWith).
Among the things I have implemented so far that are missing or different in JUnit are:
There is a "context" object which is accessible via thread local, so @BeforeClass and other suite-level hooks can actually access the suite class, inspect it, check conditions, whatever (the runner's random seed is also passed via this context). This is useful, but not crucial.
I've decided to deviate from JUnit strict policy of having public hook methods. By default this only causes headaches when one shadows or overrides a hook in the parent class and it is no longer invoked. A better (different) idea is to declare hooks as private; no shadowing occurs and they will all get invoked in a contractual predefined order (befores - super to class, afters - class to super).
I've added additional suite-level annotations. @Listeners provides listeners automatically hooked to RunListener. @Validators hooks up additional validators for verifying extra restrictions. An example of such a restriction is bailing out the test suite if shadowed or overridden methods exist in the class hierarchy of a suite class. Another (that I have implemented) is a validator checking for non-annotated testXXX methods that are dead JUnit3 test cases. You get the idea. A lot of code then simply vanishes from LTC; I can envision it having this shape:
@Listeners({
StandardErrorInfoRunListener.class})
@Validators({
NoHookMethodShadowing.class,
NoTestMethodOverrides.class,
NoJUnit3TestMethods.class})
public abstract class LuceneTestCase extends RandomizedTest {
...
}
Some of these things are currently verified using a state machine (calling super() in overridden methods), but this just looks better to me to take away this concern elsewhere rather than implement it inside LTC.
The entire lifecycle of handling test method calls and hooks is controlled in the runner. I made a design decision to not follow JUnit's insane wrap-wrap-wrap-exception style but instead report all exceptions that happen anywhere in the lifecycle. So if you get an exception in the test case, followed by an exception in @After, followed by an exception in @AfterClass, all these exceptions will be reported separately to the RunListener and in effect to all listening objects (in the lifecycle-corresponding order!). Such an implementation does work with fine with ANT JUnit reports, maven reports and in Eclipse (all exceptions are included) so far as I can tell – didn't check other environments like NetBeans or IntelliJ. Again: in my personal opinion this is a much clearer way of dealing with exceptions in the lifecycle of JUnit test case compared to wrapping them into artificial exceptions (MultipleException being a supreme example) or suppressing them altogether.
I couldn't resist a tiny tweak of making any exceptions thrown from hooks or test methods carry the information about the seed used in their execution (both runner-level and method-level, even though the latter could be derived from the former). There is no easy way to do it because Throwables are designed not to allow changes to their content once constructed. With the exception of stack traces So I simply inject a debugging info inside the stack trace as an artificial entry; what it looks like is here, for instance:
java.lang.Error: Blah blah exception message.
at __randomizedtesting.SeedInfo.seed([60BDF6E574486C2:60BDF6E76C930BC]:0)
at […].examples.TestStackAugmentation$Nested.testMethod1(TestStackAugmentation.java:29)
(Note how the seed info is inside the file position of StackTraceEntry object.). This may seem like overly clever solution, but I've had it many times that sysouts got discarded or lost somehow and an exception object along with the stack trace is always there in front of your eyes. Another way to capture-and-dump reproduction info is to use @Listeners annotation above; this can be used for much what LTC does today – -D…, -D…, -D...
A custom runner can have custom implementation of the contractual "events", such as assumptions or ignore triggers. This takes away a lot of code related to trying to get around JUnit's API limitations (assume without message/cause, method filtering and dynamic ignores based on extra conditions like @Nightly, etc.).
In short: I'm really happy with a custom Runner.
As for the infrastructure for writing randomized test cases:
There is currently one "master" seed that the runner either generates randomly or accepts as a global constant. Everything else: method shuffling, initial random instance for each test case (method repetition)… really everything is based on sequential calls to this generator. This has advantages and disadvantages I guess (read about static initializers below), but it was my personal desire to implement it this way and based on my few days' worth of experience with this code, it works great.
I've written a base class RandomizedTest that extends Assert and has a number of utility methods for picking random numbers or objects from collections. There is no passing of explicit Random instances around like it is done currently in LTC though. The base class accesses the context's Random (which it is assigned by the runner) and then uses this random consistently to generate pseudo-randomness in selection of attributes and iterations. Of course once you go multi-threaded this will all go to dust, but I imagine multi-threaded tests shouldn't use the base class's randomness (a test case based on race conditions won't be repeatable anyway). If anything, generate per-thread Randoms based on current seed and let each thread handle its own sequence of pseudo-random numbers from there. This is even possible at runtime with non-mock objects as I'm going to show in Barcelona, hopefully.
Now… if you're still with me you're probably interested how this applies to Lucene. The wall I've hit is the sheer amount of code that any change to LTC affects. I realized it'd be large, but it's just gargantuan
The major issue is with static initializers and static public methods called from them that leave resources behind. I'm sorry, but nobody can convince me this isn't evil. I understand certain things are costly and require a one-time setup, but these should really be moved to @BeforeClass fixture hooks. If one really needs to do things once at JVM lifespan level a @BeforeClass with some logic to perform a single initialization can be a replacement for a static initializer (even if it's unclear to me when exactly such a fixture would be really needed). In short: the problem with static initializers is that they are executed outside the lifecycle control of the runner… I'd say most of the problems and current patchy solutions inside LTC (dealing with resource tracking for example) are somehow related to the fact that static initializers and static method calls are used throughout the codebase.
I am currently wondering if it's feasible to provide a single patch that will make a drop-in replacement of LTC. It may be the case that adding another skeleton class based on the "new" infrastructure and rewriting tests one by one to use it may be a more sensitive/ sensible way to go.
The runner (alone) is currently at github if you care to take a look. I think Barcelona may be a good place to talk about this face to face and decide what to do with it. I'm myself leaning towards the: have parallel base classes and port existing tests in chunks.

The repo contains the runner, some tests and examples. Lots of TODOs (in TODO), so consider this a work-in-progress, but if anybody cares to take a look and shout if something is definitely not right – go ahead.

mvn verify on the topmost project compiles everything and runs the tests/ examples. I don't see any functional deviations or differences in execution between ant maven and my Eclipse GUI (mentioned by Robert) which is good.

Dawid Weiss
added a comment - 07/Oct/11 00:20 Ok. I've published the project on github here: https://github.com/dweiss/randomizedtesting
The repo contains the runner, some tests and examples. Lots of TODOs (in TODO), so consider this a work-in-progress, but if anybody cares to take a look and shout if something is definitely not right – go ahead.
mvn verify on the topmost project compiles everything and runs the tests/ examples. I don't see any functional deviations or differences in execution between ant maven and my Eclipse GUI (mentioned by Robert) which is good.

That's why I mentioned I would like this to become generally useful, not only restricted to Lucene/Solr If we make it work for two projects (Carrot2 and Lucene) chances are the outcome will be flexible enough to use elsewhere.

I'm not saying you must fix the seeds using annotations – it's an option.

Dawid Weiss
added a comment - 06/Oct/11 20:02 That's why I mentioned I would like this to become generally useful, not only restricted to Lucene/Solr If we make it work for two projects (Carrot2 and Lucene) chances are the outcome will be flexible enough to use elsewhere.
I'm not saying you must fix the seeds using annotations – it's an option.

I agree too. one difficulty with using @seed or something is our seeds quickly become out of date because we are often adding more randomization to our testing framework (e.g. additional craziness to randomindexwriter, searchers, analyzer, whatever)

Robert Muir
added a comment - 06/Oct/11 19:48 I agree too. one difficulty with using @seed or something is our seeds quickly become out of date because we are often adding more randomization to our testing framework (e.g. additional craziness to randomindexwriter, searchers, analyzer, whatever)

Sure, absolutely. In our (mostly algorithmic, mind you) experience even small test cases can be randomized and then it is really duplicated effort to re-write them for a particular bug scenario (the tests are often simple, the data changes). But sure: the simpler the test, the better.

Dawid Weiss
added a comment - 06/Oct/11 19:44 Sure, absolutely. In our (mostly algorithmic, mind you) experience even small test cases can be randomized and then it is really duplicated effort to re-write them for a particular bug scenario (the tests are often simple, the data changes). But sure: the simpler the test, the better.

But I still think we should have specific unit tests that reproduce specific scenarios, than using some monstrous tests that happened to stumble on a seed that revealed a bug. If however the scenario cannot be reproduced deterministically, then I agree that this framework is powerful and useful.

Shai Erera
added a comment - 06/Oct/11 19:27 Ok I get the point now.
But I still think we should have specific unit tests that reproduce specific scenarios, than using some monstrous tests that happened to stumble on a seed that revealed a bug. If however the scenario cannot be reproduced deterministically, then I agree that this framework is powerful and useful.

Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example:

This is a scenario I really came to like. It's a bit like your tests write themselves for you

I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass[.method]=xxxx which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste.

I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down).

Dawid Weiss
added a comment - 06/Oct/11 18:04 - edited Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example:
@Test @Seed( "23095324" )
public void runFixedRegression_1 { doSomethingWithRandoms(); }
@Test @Seed( "239735923" )
public void runFixedRegression_2 { doSomethingWithRandoms(); }
@Test
public void runRandomized { doSomethingWithRandoms(); }
This is a scenario I really came to like. It's a bit like your tests write themselves for you
I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass [.method] =xxxx which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste.
I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down).

This is only for debugging from an IDE right? It does not replace tests.iter and tests.seed?

It looks very cool.

It also adds a risk that someone will accidentally commit tests with these annotations. So perhaps we should add pre-commit hooks, or a test that scans all test files and ensures those annotations do not exist?

Shai Erera
added a comment - 06/Oct/11 14:46 This is only for debugging from an IDE right? It does not replace tests.iter and tests.seed?
It looks very cool.
It also adds a risk that someone will accidentally commit tests with these annotations. So perhaps we should add pre-commit hooks, or a test that scans all test files and ensures those annotations do not exist?

I've implemented a runner that follows the basic algorithm given in LUCENE-3489. Basically speaking, seeds for each test run are fixed derivations of a single master seed (used for the runner and all class-level fixtures) and don't rely on the order of invocations or other factors.

There's plenty of ways to tweak and tune by overriding class-level @Seed, method-level @Seed. @Repeat gives you control on how many times a given test is executed and whether a seed is reused (constant for each iteration) or randomized (predictably from the start seed).

Most of all, everything fits quite nicely in Eclipse (and I hope other GUIs... didn't check Idea or Netbeans though) because each executed test run is nicely described in the runner (full seed), so that you can either click on it and re-run a single test or write down the seed and fix it at runtime.

Dawid Weiss
added a comment - 06/Oct/11 12:08 I've implemented a runner that follows the basic algorithm given in LUCENE-3489 . Basically speaking, seeds for each test run are fixed derivations of a single master seed (used for the runner and all class-level fixtures) and don't rely on the order of invocations or other factors.
There's plenty of ways to tweak and tune by overriding class-level @Seed, method-level @Seed. @Repeat gives you control on how many times a given test is executed and whether a seed is reused (constant for each iteration) or randomized (predictably from the start seed).
Most of all, everything fits quite nicely in Eclipse (and I hope other GUIs... didn't check Idea or Netbeans though) because each executed test run is nicely described in the runner (full seed), so that you can either click on it and re-run a single test or write down the seed and fix it at runtime.
Lots of TODOs in the code, will continue in the evening.

Dawid Weiss
added a comment - 06/Oct/11 12:01 Static fixtures couldn't be handled with a rule, so I've decided to rewrite JUnit Runner instead of subclassing it. Lots of frustration so far, but I like the result