Sunday, May 8, 2011

Playing with concurrency in JDK7

As an addition to the existing concurrent package, the JDK7 comes with a Fork/Join framework. An originating article from Doug Lea about this implementation can be found here. Some algorithms solve problems using divide-and-conquer or recursive-data patterns implementations, dividing the original problem into sub problems that can be handled independently. On each algorithm step, new independent tasks are forked to handle a sub problem and then joined once their own execution achieved. This kind of algorithm can be supported by a parallel framework implementation.
The fork/join process is repeated recursively till one task find the sub-problem small enough to process it sequentially. The pattern of pseudo code found in each tasks, in charge of deciding to process or to fork the problem looks like:

Doug Lea's and al. implementation of the framework uses an indirect task/thread mapping implementation so each thread takes in charge a double-ended queue of tasks but can also steals work from other threads queues when out of job. The article quoted at the beginning presents all the advantages of this approach.Specifically one should note that this indirect mapping prevent from clogging the system by heavy creation/destruction thread implementations as a limited number of threads can aggressively work on set of tasks

The two important objects in the framework are the ForkJoinPool and the ForkJoinTask instances. The ForkJoinPool is an ExecutorService implementation of the concurrent API. It will provide the standard mechanisms of thread management and all the implementation of the work-stealing tactics. The ForkJoinTask is the light object impersonating the task that will be pushed onto the queue. This is where all the decisions concerning the fork/join strategy will be taken.

In a tribute to Doug Lea's genuine article I decided to compute some values from the Fibonacci function (Yep I like to compute the evolution of pairs of rabbits :) )
In a idealistic world (check the previous link) you will find that the evolution of pair of rabbits can be computed versus the following formula:

f(n) = f(n - 1) + f(n -2)

In order to validate the algorithm versus tables of data I implemented the two followin tests:

The first test looks like more canary test allowing me to validate the algorithm and the second test allow me to trace some time of execution for a big number.
The first class implemented, that allows to check make canary test green is

Reading the javacode of the xxxRawResult worried me a little (I will welcome any advice). So I provided an additional method to grab the result:

long result() {
return result;
}

Of course the exec() method is where we decide to compute serially or in parallel. You recognize the pattern. Here the size of the problem is really symbolized by a number we found small enough to start a sequential computation. Let's go to the divideAndConquer() method:

And that's all folks !! The two parameters as specified in Doug Lea's article, is the threshold value which depends on the nature of the algorithm and the size of the ForkJoinPool. I chose to invoke the default constructor for the pool would create as many threads as found processors on my my little handbook.

As Amdahl's law expresses it, the serial parts of the program can drastically constrain the execution speed up. so the fork and then merge operations must be reduced to a minimum.

The execution of the tests provided the following results :

rec: 45 081
fj: 23 730

in ms

That is in concordance with Amdahl's law as I do have two cores, but the ratio is not exactly 2 because of the sequential parts. I found the optimum threshold value to be 21, versus 13 in Doug Lea's article but I did not challenge the program much.

In order to complete my exploration I decided to work on another sorting. I will go faster this time. I decided to work on time stamped object, that I would need to sort.

I imagined a times tamped object that could be sorted using the Comparable interface. So I designed the following tests:

A TemporalSortTask will be executed in the ForkJoin framework, its mission being, sorting a big list
The list of timestamped objects is created using a calendar which time in milliseconds is incremented for each objects:

There we again create ForkJoinPool, letting the underlying implementation fix the number of threads. We fire the execution of a sorting task and return the result we have challenged in the test method.

The implementation of the TemporalSortTask under test surround the implementation of the exec() method:

Where a left task and a right task have been created each taking in charge a problem of sorting divided by two in size. The trick here is that the source of time stamped objects is stored as an array which reference only is copied from tasks to task. The start/end bounds of the sort range are created for each new task. The result is stored into a result array of time stamped objects. The merge operation takes the results of the two sub-tasks and then rearrange them into a result array: