Towards the end of 2015, we released our cheat sheet for Java 8 best practices, which got thousands upon thousands of hits and downloads! So we thought, we should create another one, and we did just that. This time around, based on your survey feedback, we focus on one of the most important and substantial features in the Java 8 release, the Streams API. You should click on the cheat sheet below and print it out. It will look amazing next to the Java 8 cheat sheet! Incidentally, Venkat Subramaniam, Java legend gave a Java 8 Streams masterclass just last week on Virtual JUG, so make sure you check that video out if you want to hear about streams in more depth.

Let’s start off with what a stream actually is and what a stream isn’t! Here are some important points which shape how you should think about a stream:

A Stream is a pipeline of functions that can be evaluated

Streams can transform data

A Stream is not a data structure

Streams cannot mutate data

Most importantly, a stream isn’t a data structure. You can often create a stream from collections to apply a number of functions on a data structure, but a stream itself is not a data structure. That’s so important, I mentioned it twice! A stream can be composed of multiple functions that create a pipeline that data that flows through. This data cannot be mutated. That is to say the original data structure doesn’t change. However the data can be transformed and later stored in another data structure or perhaps consumed by another operation.

We stated that a stream is a pipeline of functions, or operations. These operations can either be classed as an intermediate operation or a terminal operation. The difference between the two is in the output which the operation creates. If an operation outputs another stream, to which you could apply a further operation, we call it an intermediate operation. However, if the operation outputs a concrete type or produces a side effect, it is a terminal type. A subsequent stream operation cannot follow a terminal operation, obviously, as a stream is not returned by the terminal operation!

Intermediate operations

An intermediate operation is always lazily executed. That is to say they are not run until the point a terminal operation is reached. We’ll look in more depth at a few of the most popular intermediate operations used in a stream.

filter – the filter operation returns a stream of elements that satisfy the predicate passed in as a parameter to the operation. The elements themselves before and after the filter will have the same type, however the number of elements will likely change.

map – the map operation returns a stream of elements after they have been processed by the function passed in as a parameter. The elements before and after the mapping may have a different type, but there will be the same total number of elements.

distinct – the distinct operation is a special case of the filter operation. Distinct returns a stream of elements such that each element is unique in the stream, based on the equals method of the elements.

Here’s a table that summarises this, including a couple of other common intermediate operations.

Function

Preserves count

Preserves type

Preserves order

map

✅

❌

✅

filter

❌

✅

✅

distinct

❌

✅

✅

sorted

✅

✅

❌

peek

✅

✅

✅

Terminal operations

A terminal operation is always eagerly executed. This operation will kick off the execution of all previous lazy operations present in the stream. Terminal operations either return concrete types or produce a side effect. For instance, a reduce operation which calls the Integer::sum operation would produce an Optional, which is a concrete type. Alternatively, the forEach operation does not return a concrete type, but you are able to add a side effect such as print out each element. The collect terminal operation is a special type of reduce which takes all the elements from the stream and can produce a Set, Map or List. Here’s a tabulated summary.

Function

Output

When to use

reduce

concrete type

to cumulate elements

collect

list, map or set

to group elements

forEach

side effect

to perform a side effect on elements

This cheat sheet is brought to you by XRebel, a tool to remind you about your app performance when you are actually working on it, not later when your clients think it is already slow. If you're working with any Java web applications, you should try it. It might change your attitude towards performance. Try XRebel!

Stream examples

Let’s take a look at a couple of examples and see what our functional code examples using streams would look like.

Exercise 1: Get the unique surnames in uppercase of the first 15 book authors that are 50 years old or older.

So you know, the source of our stream, library, is an ArrayList. Check out the code and follow along with the description. From this list of books, we first need to map from books to the book authors which gets us a stream of Authors and then filter them to just get those authors that are 50 or over. We’ll map the surname of the Author, which returns us a stream of Strings. We’ll map this to uppercase Strings and make sure the elements are unique in the stream and grab the first 15. Finally we return this as a list using toList from java.util.streams.Collectors.

Using the same original stream we once again map the elements from Books to Authors and filter just on those authors that are female. Next we map the elements from Authors to author ages which gives us a stream of ints. We filter ages to just those that are less than 25 and use a reduce operation and Integer::sum to total the ages.

Parallel Streams

You can parallelise the work you do in a stream in a couple of ways. Firstly you can get a parallel stream directly from your source by calling the parallelStream() method directly as shown:

library.parallelStream()...

Alternatively, you can call an intermediate operation on an existing stream which spawns off threads and executes further operations in parallel, as shown:

IntStream.range(1, 10).parallel()...

One important thing to note is that parallel streams achieve parallelism through threads using the existing common ForkJoinPool. As a result there are possible complications as we detailed in this previous RebelLabs post. Using parallel streams can cause concurrency issues depending on what you’re doing in your stream as well. Make sure you need to use a parallel stream for a big enough job, rather than using them by default. Also given you’re using the common ForkJoinPool, be sure not to run any blocking operations.

A classic example of a potential concurrency issue when using parallel streams is when updating a shared mutable variables from a forEach operation. Let’s consider the following code:

Would you expect multiple threads to concurrently access an ArrayList without issue normally? Of course you wouldn’t! So you shouldn’t expect it to within a parallel stream. In fact, it’s best not to do this even in a non-parallel stream, just in case someone tries to make it parallel in future. It tends to be safer to collect this into a List using the collect() operation.

Anyway, don’t want to hold you any longer, I know you’re pretty convinced by now that Java 8 streams are paramount to the best practices of development and you can get all the information about them in this concise 1 page cheat sheet!

This is our second cheat sheet, after the Java 8 Best Practices Cheat Sheet. Our next one is already in the works, due to the popularity and demand! This one will be around some of the most useful git commands known to man, and woman! Make sure you keep an eye on the RebelLabs blog so you don’t miss it! You could even subscribe to our mailing list so that you get a gentle nudge when it’s available.

Simon is a Developer Advocate at ZeroTurnaround, a Java Champion since 2014, JavaOne Rockstar speaker in 2014, Virtual JUG founder and organiser, London Java Community co-leader and RebelLabs author. He is an experienced speaker, having presented at JavaOne, JavaZone, Jfokus, DevoxxUK, DevoxxFR, JavaZone, JMaghreb and many more including many JUG tours. His passion is around user groups and communities. When not traveling, Simon enjoys spending quality time with his family, cooking and eating great food.

I don’t agree with Pitfall #1 (from the cheat sheet itself). The forEach() method doesn’t disallow side-effects, and the semantics of a terminal op taking an action (void-compatible lambda) is usually that of allowing side effects. So, modifying shared mutable state is not uncommon. However, the example is of course a useful pitfall as it modifies the stream’s underlying data source. With a for / foreach loop, this would have produced a ConcurrentModificationException

Still don’t agree. What’s the point of Stream.forEach() then, if not to allow side-effects (which always operate on “shared mutable state”)?

Do note the Javadoc:

[…] If the action accesses shared state, it is responsible for providing the required synchronization.

Simon Maple

Let me ask you another question…

Why would you want to update shared mutable state manually having to take care of potential concurrency issues, when a safe alternative exists using Collectors.toList()?

I appreciate that updating shared mutable state is done often and there are ways of doing it as safe as possible, but when a one liner provides full safety there is no reason to approach this manually and risk timing issues.

Oleg Šelajev

Well, the point might be for every element to send a message to actors. Or to write them into a database, or to print them out :) Side-effects. Collecting elements into a list is a very specific side effect that require synchronization and is better modelled as a collect operation.

From an academic point of view, I don’t see the difference. The screen / terminal is a collection of characters, and printing to it is inherently synchronized, so it is safe “by default”. A database is the prototype of a persistent collection. Relations are collections. Databases have gotten synchronization right from the very beginning.

A Java collection is just a simple example of yet another collection. And if you must, you can synchronize on the add() call.

You can insist of course, and there’s a code-smell, indeed because collect() is better in 95% of the cases. But the criticism then makes the wrong points, in my opinion.

I don’t disagree with the fact that collect() is better suited in this example, but the pitfall (and Venkat’s talk) seem to suggest that using forEach() are bad per se. It’s kind of like saying “goto is evil” (or break/continue). In 90% of the cases, there’s a “better” option, but these options have their place.

Note that forEach() is also used in the JDK’s flatMap() implementation, rather than a more functional, recursive implementation. In other words: The flatmapped stream(s) are processed by having them operate on the shared (probably non-mutable) parent stream…

Karol

There is an error in first of “Stream examples”. You are getting first 15 unique authors, but the task it to get unique surnames of first 15 authors. In case there are more than one author with the same surname, you will get duplicated values.

Leonóra Dér

I think a .distinct() is missing from the solution of Excercise 2. I guess an author can write more than one books… :)