Now, we can ask ourselves a question. Is this solution in its current form easy to turn into a parallel solution? The answer is NO. Each step in our loop relies on the result from the previous step. So, if I split the sentence into say two pieces, and I am processing the start of the second piece, I don't know whether a character or space finished the previous piece. I.e. the order in which I process information is important in the current algorithm and to make it work easily in parallel, we need to refactor the algorithm into some equivalent one where the order is no longer important. Technically we require that the algorithm is associative. The figure below shows what we mean.

If we have three things to calculate, it doesn't matter if we do the first two first or the last two first.

Another useful property that would be good for our algorithm to have is left and right identity ("zero") elements.

This property allows us to break our data up into pieces and (if needed) substitute a ZERO element at one or the other end, i.e. our algorithm will play nicely at the boundary points. Basically we can turn this:

into this:

Technically, combining associativity with left and right identity gives us a monoid. The details aren't important here but basically algorithms with such properties can be easily made parallel. This is exactly what we want when using techniques such as map/reduce or divide and conquer as part of our solution.

So getting back to word splitting we need to change the algorithm to remember the boundary conditions. To do that we will create a little domain for remembering "chunks" and "segments" of partial results. Here is one way to code that solution (and credit goes to Guy Steele's article for devising this domain):

We'll have more to say about these classes later but basically we have plus ("+") methods for adding Segments and Chunks, a ZERO identity element and a flatten method which takes our Chunk or Segment and converts it into a list of words.

Guy Steele's slides show using this domain in more detail. Here is a summary snapshot to illustrate the main idea.

Now we are ready to write a slightly more functional flavored sequential solution:

More importantly, because of the properties of the classes in our domain, we are ready to write some parallel solutions. Before doing that, let's take a slight diversion to illustrate how we might further check some of our algorithm's properties. We'll use the Java port of quickcheck:

and gives me confidence that my algorithm is a monoid. The above showed that Segments have left and right identity elements for "+" (in fact both Check.ZERO and Segment.ZERO work). If we wanted to, we could have additional checks for the associativity piece or for properties of Chunk.

What have we achieved? Well, now we are in a position to write some parallel versions of our algorithm. Many have a common strategy and that is to divide the input into sections. Typically the division stops once the sections reach a certain granularity of size. As a general rule, if we divide past a certain level of granularity, then the overheads associated with setting up the parallelism out weigh the parallelism gains.

Here's one version which uses an old school concurrent hash map. This version divides the input sentence into 4 pieces and solves each piece in a separate thread, storing the results into a concurrent hash map.

Alternatively, I can use GPars and not have to deal with as much explicit synchronization.

Firstly, we'll look at a map reduce version. It is comprised of a partition part which splits the data up - in our case into 4 pieces. Then the map part runs our serial solution for each piece - notice that the mapping is independent and can be run completely in parallel. For the reduce part we will use our "+" monoid. Here is the code:

Alternatively, we can use GPars' dataflow capabilities. With dataflow we write expressions which declaratively express the relationships in our data. In our case the pieces must be summed (using our "+" monoid) once they have been calculated:

Finally, we can create a parallel array version. This version (as currently written) doesn't limit the amount of parallelism to some granularity level, instead it follows the same approach as our functional sequential version but just replaces inject with collectParallel which will automatically perform its steps in parallel. Here is the code:

Here, the agent acts as a synchronization layer between our code and the standard LinkedHashMap (the default map for Groovy). This offers no particular advantage over ConcurrentHashMap but illustrates how agents work in general. We supply the agent with code to run and it runs that code after performing the necessary synchronization.