converts an Option to a Stream, very useful when combined with streams. Say you have a list on optionals and you want to filter out the empty ones and leave the non empty ones, simply you can convert to a Stream and flatmap it.

Showing that scanning an ArrayList is way more efficient than scanning a LinkedList.

Why?

Locality. Locality does matter. Reading a value from memory is expensive compared to reading it from the cache. Arrays are great in getting the best out of your cache line by batch loading relevant data to the cache.

Memories are the new disks. Cliff Click.

Another problem with linked lists is their huge footprint. Every node needs special object allocation and maintains at least two references to the next and previous node. This problem just doesn’t exist in ArrayList where all what you allocate is big array object and extra variables for bookkeeping.

From Odersky’s paper Generics of a Higher Kind : Conceptually, kinds are used to distinguish a type parameter that stands for a proper type, such as List[Int], from a type parameter that abstracts over a type constructor, such as List.

And from the same paper: generalisation to types that abstract over types that abstract over types.

In simple words, higher-kinded types are high order types, like higher order functions types can be expressed as values ( passed as parameters and returned from functions).

What do languages with poor type-systems like Java do for expressing higher-kinded types?

They give up! A map or a filter over a Stream in Java returns a Stream and the programmer is asked to explicitly convert the stream back to the desired proper type.

stream.map(c -> c.length()).collect(Collectors.toList());

Libraries like Guava provide an endless list of static helpers with almost the same functionality in order to return the same type back.
Where in Scala a map over a List returns a List and a map over an Option returns an Option.

In a previous post we looked at parallel reduction and gave a conceptual background on using the ForkJoin model to parallelise an embarrassingly parallel operation. Reduction is at the heart of building parallel algorithms. In this post, we will to take this knowledge a step further and use it in building a custom Collector that works with a parallel pipeline.

Motivation, Why Would you build a Collector?

Curiosity.

The collector you need is not provided by the JDK.

You don’t want to use an old style loops, not because loops are bad but they are hard to parallelise.

The last thing you want to do is writing a ForkJoin Task.

A SumAndCount Collector

The collector that we will be building works on a Stream on Integers, accumulates the sum in a BigDecimal and counts the elements in the source Stream.

In this post we will look on how would we go about building a general purpose parallel reduce using the ForkJoin model.

Btw, you don’t need to do that if you are on Java 1.8 or later versions because Streams already support reduction and you should be using them rather than rolling your own, so this is for educational purposes only.

The idea of reduction is very simple, given a dataset, transform it into a single value. An example of reduce would be, given an array of integers, return the sum of all its elements. Such an operation could be done using a simple loop and an accumulator.

int sum = 0;
for(int element:array){
sum += element;
}

The only problem with code above is that it doesn’t scale to multi-core. Given infinite number of processors, the algorithm above will run in O(n) time regardless. At the end of this post, you should be able to convince yourself that such an operation could done O(lgn) if the number of processors we have tends to infinity.

A Divide and Conquer sum.

Instead of calculating the sum through scanning the entire array at once sequentially, we can use a divide and conquer algorithm for that, something very similar to merge sort; divide the array into two parts and solve the subproblem recursively. As in merge sort, we aiming for recursion tree of depth O(lgn), although in practice our recursion tree would be shorter than that.

A Naive Implementation.

Anaive way would be to have a shared atomic accumulator, split the array into parts recursively and when when the size of the array is small enough to be computed sequentially, we calculate the partial sum, append it to the accumulator. At the end of this, the accumulator will have the total sum of the element.

The algorithm is definitely correct, it derives its correctness from the sequential one anyway. The only problem is that this algorithm suffers from contention. All worker threads have to coordinate when they ready to contribute with their partial sum. This would introduce a sequential bottleneck to our algorithm and according to Amdahl’s law, sequential parts of the algorithm is what governs its performance.

One more problem with the algorithm described above is that it is not cache friendly, atomics are usually implemented using a CAS instruction. Although CAS scales better than a mutex as it doesn’t suffer from context switching, it still requires cache line invalidation.

A Better Solution.

A better solution would be a shared nothing solution where our solution tree looks like binary balanced tree, when we reach leaf ( a partial sum), we combine it with it’s sibling’s partial solution building a bottom up aggregated result.

But this looks like a pattern, we often accumulate, not necessarily for sum, but counting is a similar thing, so is MAX, so is MIN. So, can we build a general purpose reduce? Of course we can. However, there is limitation to what can be achieved using this pattern. The operation has to be Associative, in simple words, the order doesn’t matter. Sum for example is associative.

A + (B + C) = (A+B) + C

So is MAX

MAX(1,-77,13,4) = MAX(4,1,13,-77)

A property that division doesn’t have since order matters.

Ingredients for our Recipe

Now it’s time to think of the tools needed for that.

We need a source collection, a one that we can split efficiently. A linked list would be a horrible one, because splitting it requires linear time. Arrays to rescue ! Arrays split in constant time, remember merge sort?? Moreover, arrays get the best out of cache lines because of locality.

An operator, a binary operator in particular. Something that takes two parameters, does something, and returns a result. For that, we will be using BinaryOperator from JDK 1.8.

ForkJoin framework.

An identity or seed, a value that we can as a baseline ( “initialiser”) for our computation. For sum it would be 0, for MAX it would be Integer.MIN_VALUE

Java Implementation

The code for this is incredibly simple. All what we need is a simple RecursiveTask and tiny wrapper around that hides some details.

Most usage of string.toLowerCase and string.toUpperCase in our programs has nothing to do with lower or upper casing strings, all what we want to achieve most of the time is case insensitive comparison.

The problem is that toLowerCase will allocate a new string which includes array copy and produce unnecessary garbage for no good reason. The good news is, there is a better and more concise way for achieving this using String.equalsIgnoreCase(String).

I’ve been thinking about this for while and I finally managed to compile a list of the 10 worst things in Java. And by the way, I love Java, the language and the platform. So lets get started

javac BadThings.java

Weak, Soft, and Phantom references. Those are evil, if you like chasing weird bugs in an unpredictable program, then use them. Basically the program’s behaviour will rely on the GC which is unpredictable.

Every object is a mutex, if you have a reference to an object you can lock on it.

Checked Exceptions. They becoming worst with lambdas and method references. Anders Hejlsberg described their ugliness here. Although I believe throws isn’t too bad as an annotation for the programmer and the machine.

Method overloading resolution in C style languages is a tricky business, I am referring here to Java an C# in particular. Resolution rules seem obvious for the novice.

void fn(int x);
void fn(int x,int y);

So far so simple, but what happens if we have the following

void fn(float x);
void fn(double x);
// call fn with an int
fn(1);

the value 1 will be resolved into a 32-bit integer and we have 2 candidates, one that takes a float and the other takes a double, both types can hold an int, so which one would it choose? The answer is fn(float). Why? Because fn(float) is more specific. It looks at the candidates, it finds a float and a double, float is a 32-bit signed floating point type and double is a 64-bit signed floating point type and hence we can squeeze any float into a double and hence we choose float 🙂

So we calling fn with a null parameter. There are 2 candidates Foo and Bar, every Bar is a Foo but not every Foo is a Bar and hence it chooses Bar because it is more specific 🙂
If we have a third candidate that takes a String as follows.

Thats a true story.
If a method returns you a List<T> doesn’t mean that you can guarantee getting a List<T>, it might do, and might not. Not even a subtype of T, actually you have to declare that T is covariant explicitly.