Connect

Multithreading with Java 8 and RxJava

Multithreaded code is something we’ve been using more and more here at HomeAdvisor. Obviously, running in a modern servlet container we’ve been dealing with multithreading for years. A couple of years ago we began using multiple threads to handle a single HTTP request. Why would we ever want to deal with the complexities of multithreaded code? Well, we can’t say that we were exactly excited to introduce complexity, as we think in simplicity lies elegance. The sole motivation for multithreaded code is performance, and performance is very important to us. There are two main reasons we care about performance:

We love our users and want to keep them happy.

Bots will crawl more of our site the faster it is to respond.

Throughout this article we’ll use a simple scenario to highlight different Java multithreading techniques. We have some code that makes two possibly long network calls to get data, then combines the data into a single list to create a new object for displaying. To illustrate, here is the baseline version that has no regard for blocking or multithreading:

If both network calls take awhile, the thread doing all this work will be blocked until both calls complete. We’ll spend the rest of this article trying to improve our approach using different Java multithreading techniques.

Java Multithreading Basics

But first, some quick background on threading in Java. Java offers several built-in mechanisms that go along way towards multithreading. The first is threads. Threads are the basic unit for doing work in the JVM, and you’re application is likely already using them whether you know it or not. But creating threads is expensive, so a long time ago, thread pools were created.

But more threads aren’t always better. If there were no blocking I/O, then the ideal world would be #threads == #CPUs. This is because switching the thread being executed on a CPU has a cost (saving and loading state from memory, etc). The operating system and JVM have to do work to enable this context switch. Java 7 introduced the ForkJoinPool implementation of ExecutorService, which understands this cost and always has #threads == #CPUs. As tasks are submitted, it runs newer tasks first (LIFO) to minimize context switching (since newer tasks are more likely to refer to data that is still in L1/L2 cache). It also allows threads to steal work from other threads’ queues in a FIFO manner to prevent old tasks from starving.

Additionally, the Future class presents a clean API for the thread executing a Runnable and the thread waiting for the result to communicate, so you don’t have to deal with locks, volatile variables, notify(), etc. (i.e. the super low level concurrency primitives that Java provides). Sadly, the Future class does not provide an easy way to transform or compose Future<T> instances. Consider a quick example:

Java

1

2

Future<User>u=getUser(intid);

BooleanhasAddress=u.get().hasAddress();

Since we can’t transform the Future<User> to a Future<Boolean>, we’re forced to call Future.get(), which blocks until the result is ready. This is the usual pattern you run into with usage of Future, and it’s hard to avoid. Until Java 8, that is.

This is awesome. The CompletableFuture.thenApply() method allows you to transform a Future<T> to a Future<U> without blocking. This just takes a Function<T, U>, like the map() method on Java 8 streams. It also has methods to compose CompletableFutures together, like thenCombine. Notice that is also plays very nicely with both lambdas and functional interfaces in general.

Getting back to our original example, let’s see if we can improve the creation of display objects using CompletableFuture:

We’ve managed to avoid blocking on the network calls, but we still need both calls to complete before we can build our list of display objects. Luckily, Java 8 gets us a step closer to getting over this hurdle.

Crossing the (Parallel) Streams

The Stream API is a huge step forward for the Java language. We could spend days talking about their utility, but for now let’s look at one aspect of them in particular, since we’re just talking about multithreading: ParallelStream.

// But a better option (performance wise) is probably to fetch User objects in bulk.

users=userService.getUsersById(userIds).values().stream();

This is a prime example of switching something to use parallel stream. Super easy! Just change a single method name. Very similar to the canonical example online of using the parallelStream() method, which is to fetch webpages from multiple URLs. But here there be dragons.

Notice there is no way to specify the ExecutorService. Parallel streams always use the common ForkJoinPool, which as we saw earlier always has #threads = #CPUs. Blocking those threads on network I/O doesn’t seem like a great idea. What if you have other usages of parallelStream in your application? You are potentially impacting the performance of all of them by doing I/O in this pool. The only case in which you should consider using parallelStream is if you have CPU intensive work (think of classic fork/join algorithm examples). Thanks to jrebel for pointing this out.

So parallelStream() is too simple to be useful. But how do we work on “streams” or “collections” of objects asynchronously? Our above example where we pass around a CompleteFuture<List<User>> felt a little clunky. You can’t start processing one User until you have retrieved all users. Is there some other way we could combine these concepts of Futures and Collections to get the benefits of both? Let’s try:

Well that works, and remains non-blocking, but isn’t pretty. We’ve duplicated the well known zip method, and it could be written once and hidden away in a static method. Also, we had to start with two Lists of known size, rather than Streams. Depending on what you are doing, this may or may not seem like a limitation. The main takeaway from this example is that Java 8 isn’t fully in the functional world yet. This is more a failing of the Stream API than the CompletableFuture API. We can write the code to shim some functional feel into it, but zip was about the simplest example of a common functional method that’s missing. If it gets more complicated, we don’t want to be writing and maintaining code that is available elsewhere.

RxJava Observables

A library for composing asynchronous and event-based programs by using observable sequences.

Also worth a read, the reactiveX.io page, which has a great list of benefits and also explains handily where Rx fits in the landscape (reproduced below):

Single Items

Multiple Items

synchronous

T getData()

Iterable<T> getData()

asynchronous

Future<T> getData()

Observable<T> getData()

The real takeaway here it that the RxJava Observable class perfectly fits what we were trying to do earlier. It is a generalization of the classic Observer pattern, some error and completion case handling and CompletableFuture, with all the functional methods you could dream of. They are lazy (like streams), have better composition (zip, window, merge, groupBy, etc.), and do error handling at the “stream” level rather than the individual item level. Observables also have advanced options like how to deal with back pressure (when the producer is producing faster than the consumer can pull items off the observable).

What the above table doesn’t stress, but is also pretty awesome, is that you can use Observables for any of the cells in the table. Meaning that the Observer API is agnostic as to how the data it is pushing completes. This is similar to Iterable and Future. Iterables can have a single item in them, and Futures can be initialized in a completed state (can choose whether to be synchronous or asynchronous). We don’t often have a use case for this. But if you choose to adopt RxJava across your codebase, it is good to realize that you only have to know one API super well. You don’t need to switch back and forth between CompletableFuture and Observable.

So this is all great and amazing (and if you’re not convinced, just go read the RxJava intro linked above). The code is compact and readable, and we’re now processing data as efficiently as possible. But everything has a downside. What exactly are the downsides of RxJava? We see two big things. The first is that it is outside the Java core language. Open source libraries are a huge part of the Java ecosystem, and in general we love them (so much that we even write our own). But when they align closely with the way the Java language appears to be headed, they can easily be obsoleted with the next Java major version (think JodaTime). We believe RxJava is pretty highly aligned with the direction of the Java language. Java probably won’t copy it wholesale, so at some point RxJava probably gets deprecated as long as the Java language architects don’t totally mess it up.

Reason two not to use RxJava: it’s complicated! There are quite literally 300+ methods on the Observable class. Many of them are overloads that allow passing in Schedulers, etc. but there are still ~90 really unique methods. In fact, this is a reason not to use multithreading in general: it’s complicated! RxJava is cool. CompletableFuture is cool. But both should be used only as needed. Heed these words from the writers of THE java concurrency book:

Unfortunately, many of the techniques for improving performance also increase complexity, thus increasing the likelihood of safety and liveness failures. Worse, some techniques intended to improve performance are actually counterproductive or trade one sort of performance problem for another. While better performance is desirable, safety always comes first. First make your program right, then make it fast – and then only if your performance requirements and measurements tell you it needs to be faster. In designing a concurrent application, squeezing out the last bit of performance is often the least of your concerns.

Observations on the Future of Java Multithreading

We’re still in the learning phase with both CompletableFutures and RxJava here at HomeAdvisor. We’ll be moving forward with careful adoption of RxJava and/or CompletableFuture as we find places in our codebase where it makes sense (for example, our open source Java API client offers reactive execution). As we move to an architecture with more and more functionality exposed as APIs in microservices and attempt to meet our performance requirements, we expect to find more and more such places. If you’ve already gone through a similar transition, we’d love it if you’d share your insights. Are there other technologies in the Java ecosystem we should be investigating? Are there other gotchas to these technologies we should have pointed out? Please let us know your thoughts. Or even better, come join us!

Based in Golden, CO, HomeAdvisor’s technology group is comprised of nearly 100 Java ninjas, front end gladiators, QA warriors, U/X experts and other rock stars. We build the technology that helps make HomeAdvisor the best place for homeowners to connect with home service professionals.

Download our Free Apps

Trackbacks

[…] performance is freeing yourself from the bonds of synchronous processing. We touched on this in our post about Java Multithreading, but by far the best way to get the most out of your microservices is to perform as many tasks as […]

About Us

Based in Golden, CO, HomeAdvisor’s technology group is comprised of nearly 100 Java ninjas, front end gladiators, QA warriors, U/X experts and other rock stars. We build the technology that helps make HomeAdvisor the best place for homeowners to connect with home service professionals.