Java Stream API – Part II

In this tutorial let us continue with the Java Stream API. Couple of weeks earlier we saw about the introduction to Java Stream API. There were some leftovers in that topic and lets have a look at them now. A warning for you, this is pure theory and a dry one. It may not be interesting for those who wish to dive deeper and looking for code.

Key Properties of Java Stream

Java stream is not a data store like a collection. It is a sequence of elements. Like flow of water from a small narrow river. It originates from a collection, an array, a generator function and flows the data through a pipeline of operations.

Java stream is functional. Source is not modified by the operations on the stream.

Java streams are consumable, it can be processed only once and similar to Iterator in nature in this context.

Intermediate operations are lazy.

Stream’s source can be infinite and it can be bounded by short-circuiting operations like findFirst().

Parallel Streams

One good advantage of streams is that the operations can be done in parallel. It will be particularly efficient on stateless operations like map(), filter() etc. In the case of stateful operations like sort(), multiple passes might be required to get the result.

The characteristic of the stream is decided as parallel or sequence when it is created. Streams are created generally by the method “stream()”. Instead, if the the method “parallelStream()” is used then it becomes a parallel steam.

Streams operations are initiated when the terminal operation is invoked. Then based on how the stream is created the stream is executed in parallel or sequence. Needless to say, irrespective of the mode parallel or sequence the end result should not vary.

Non-interference in Stream Source

Non-interference is important to have consistent Java stream behavior. Imagine we are process a large stream of data and during the process the source is changed. The result will be unpredictable. This is irrespective of the processing mode of the stream parallel or sequential.

The source can be modified till the statement terminal operation is invoked. Beyond that the source should not be modified till the stream execution completes. So handling the concurrent modification in stream source is critical to have a consistent stream performance.

Stream Reduction (fold) Operations

Taking a sequence of elements and combining it into a single entity is stream reduction operation. Following are some of the stream reduction operations,

reduce()

collect()

sum()

min()

max()

count()

Presently, stream reductions are performed using mutable accumulation. Mutable accumulation is nothing but having a for-loop and iterating over the source and accumulating the result in a mutable variable. When we use stream reduction in this scenario, it can be made parallel and so efficient.

collect() is a mutable reduction operation. It collects the result from a source into a mutable collection. This collection operation consists of the three elements,

result container creation by a supplier function

route input element into result container by an accumulator function

combining result containers into one by a merge function

Stream Construction

This part is to present a peek into how the streams are constructed at low-level. We use .stream() function to create a stream. Spliterator is an important underlying class which helps to construct a stream. Spliterator takes care of the key points like the describing the collection of elements, traversing the elements, splitting portions to create another spliterator so that the stream can be processed in parallel.

In my next tutorial I am planning to write some interesting code to gauge the performance of sequence vs parallel operations in streams. Look out for that and it will be interesting.