In C and many other programming languages the straightforward way to achieve this is to fork separate processes for such tasks. Unfortunately Java doesn’t support the concept of a fork (i.e. creating a copy of a running process). Instead, all you can do is to start up a completely new process. To create a mirror copy of your current process you’d need to start a new JVM instance with a recreated classpath and make sure that the new process reaches a state where you can get useful results from it. This is quite complicated and typically depends on predefined knowledge of what your classpath looks like. Certainly not something for a simple library to do when deployed somewhere inside a complex application server.

But there’s another way! The latest Tika trunk nowcontains an early version of a fork feature that allows you to start a new JVM for running computations with the classes and data that you have in your current JVM instance. This is achieved by copying a few supporting class files to a temporary directory and starting the “child JVM” with only those classes. Once started, the supporting code in the child JVM establishes a simple communication protocol with the parent JVM using the standard input and output streams. You can then send serialized data and processing agents to the child JVM, where they will be deserialized using a special class loader that uses the communication link to access classes and other resources from the parent JVM.

My code is still far from production-ready, but I believe I’ve already solved all the tricky parts and everything seems to work as expected. Perhaps this code should go into an Apache Commons component, since it seems like it would be useful also to other projects beyond Tika. Initial searching didn’t bring up other implementations of the same idea, but I wouldn’t be surprised if there are some out there. Pointers welcome.

Interested in JCR and looking for a new laptop? My employer has announcedJCR Cup 2008, a JCR design and development competition. Too bad I can’t enter the competition, I’m sure we’ll come up with a whole load of cool ideas next week.

I was positively surprised about the level of attendance and also the interest in Tika during the Search Roundtable BOF later in the evening. Even though the project is still just starting, it’s already generating lots of interest and I really look forward to getting the first releases out.

If you’re interested in concurrency, distributed systems, and ways to best use the manycore processors we’re being promised, then check out the Concurrency and River thread on the development mailing list of the incubating Apache River project. The thread is about concurrency and ways the River project (a continuation of Jini from Sun) and related technologies like JavaSpaces could be used to parallellize many computing tasks. There are also some nice comparisons to Erlang and Scala, and how the actor model used by them is related to the Jini network model.

I just had to follow a stack trace through a complex codebase with multiple layers. The exception chaining mechanism introduced in Java 1.4 made the task easy up to the point where the last exception in the chain was an IOException thrown by code like this:

try {
....
} catch (Exception e) {
throw new IOException("...");
}

What a dead end! The problem is that the IOException constructors that allow exception chaining were only added in Java 6. Here’s a workaround that would have saved me a lot of extra effort:

Digging deeper along these lines I found out that you could actually implement similar functionality also directly in a TestCase subclass, thus avoiding the need to override TestCase.createResult(). The best way to do this is to override the TestCase.runBare() method that gets invoked by TestResult to run the actual test sequence. The customized method can check whether to skip the test and just return without doing anything in such cases.

I implemented this solution as a generic JUnit 3.x ExcludableTestCase class, that you are free to copy and use under the Apache License, version 2.0. The class uses system properties named junit.excludes and junit.includes to determine whether a test should be excluded from the test run. Normally all tests are included, but a test can be excluded by including an identifier of the test in the junit.excludes system property. An exclusion can also be cancelled by including a test identifer in the junit.includes system property. Both system properties can contain multiple whitespace-separated identifiers. See the ExcludableTestCase javadocs for more details.

You can use this class by subclassing your test cases from ExcludableTestCase instead of directly from TestCase:

You can then exclude the test case with -Djunit.excludes=my.test.package, -Djunit.excludes=MyTestCase, or -Djunit.excludes=testSomething or a combination of these identifiers. If you’ve for example excluded all tests in my.test.package, you can selectively enable this test class with -Djunit.includes=MyTestCase.

You can also add a custom identifiers to your test cases. For example, if your test case was written for Bugzilla issue #123, you can identify the test in the constructor like this:

A few days ago I considered different options for including known issue test cases (ones that you expect to fail) in a JUnit test suite in a way that wouldn’t make the full test suite fail. I decided to adopt a solution that uses system properties to selectively enable such known issue test cases. Here’s how I implemented it for Apache Jackrabbit using Maven 1 (we’re currently working on migrating to Maven 2, so I’ll probably post Maven 2 instructions later on).

The first thing to do is to make the known issue tests check for a system property used to enable a test. The example class below illustrates two ways of doing this; either to make the full known issue test code conditional, or to add an early conditional return to skip the known issue. You can either use a single property like “test.known.issues” or different properties to allow fine grained control over which tests are run and which skipped. I like to use the known issue identifier from the issue tracker as the controlling system property, so I can selectively enable the known issue tests for a single reported issue.

Once this instrumentation is in place, the build system needs to be configured to pass the identified system properties to the code when requested. In Maven 1 this happens through the maven.junit.sysproperties setting in project.properties:

This way the known issue tests will be skipped when normally running “maven test“, but can be selectively enabled either on the command line (“maven -DISSUE-foo=true test“) or by modifying project.properties or build.properties.