Search This Blog

flatMap() vs. concatMap() vs. concatMapEager() - RxJava FAQ

There are three, seamlessly similar operators in RxJava 2.x: flatMap(), concatMap() and concatMapEager(). All of them accept the same argument - a function from original stream's individual item to a (sub-)stream of arbitrary type. In other words if you have a Flowable<T> you provide a function from T to Flowable<R> for arbitrary R type. After applying any of these operators you end up with Flowable<R>. So how are they different?

Sample project

First let's build a sample application. We will use Retrofit2 HTTP client wrapper that has built-in plugins for RxJava2. Our task is to leverage GeoNames API in order to find the population of any city in the world. The interface looks as follows:

The implementation of this interface is auto-generated by Retrofit, scroll down to see glue source code. For the time being just assume we have a function that takes a String with city name and asynchronously returns a one-element stream with a population of that city. Also assume we have a fixed stream of cities we want to look up:

concatMap(): process upstream sequentially

Before we see the outcome let's study what concatMap() is doing underneath. For each upstream event (city) it invokes a function that replaces that event with a (sub)stream. In our case it's a one-element stream of Long (Flowable<Long>). So with all operators we are comparing we end up with a stream of streams of Long (Flowable<Flowable<Long>>). The real difference arises when we analyze what the operator is doing in order to flatten such nested stream.

concatMap() will first subscribe to the very first substream (Flowable<Long> representing population of Warsaw). By subscribing we actually mean making the physical HTTP call. Only when the first substream completes (emits a single Long in our case and signals completion) concatMap() will continue. Continuing means subscribing to the second substream and waiting for it to complete. The resulting stream completes when the very last substream completes. This leads to a following stream: 1702139, 2138551, 7556900 and 3255944. These happen to be populations of Warsaw, Paris, London and Madrid, accordingly. The order of output is entirely predictable. However it's also entirely sequential. No concurrency happens at all, we make second HTTP call only when the first one completed. The added complexity of RxJava doesn't pay off at all:

As you can see no multithreading occurs, requests are sequential, waiting for each other. Technically not all of them must happen in the same thread, but they never overlap and take advantage of concurrency. The big plus is guaranteed order of resulting events, which is not that obvious once we jump into flatMap()...

flatMap(): processing results on-the-fly, out-of-order

And just like before we start with a stream of streams of Long (Flowable<Flowable<Long>>). However rather than subscribing to each substream one after another, flatMap() operator eagerly subscribes to all substreams at once. This means we see multiple HTTP requests being initiated at the same time in different threads:

When any of the underlying substreams emit any value, it is immediately passed downstream to the subscriber. This means we can now process events on-the-fly, as they are produced. Notice that the resulting stream is out-of-order. The first event we received is 7556900, which happens to be the population of London, second in the initial stream. Contrary to concatMap(), flatMap() can't preserve order, thus emits values in "random" order. Well, not really random, we simply receive values as soon as they are available. In this particular execution HTTP response for London came first, but there is absolutely no guarantee for that. This leads to an interesting problem. We have a stream of various population values and initial stream of cities. However the output stream can be an arbitrary permutation of events and we have no idea which population corresponds to which city. We will address this problem in a subsequent article.

concatMapEager(): concurrent, in-order, but somewhat expensive

concatMapEager() seems to bring the best of both worlds: concurrency and guaranteed order of output events:

After learning what concatMap() and flatMap() are doing, understanding concatMapEager() is fairly simple. Having stream of streams concatMapEager() eagerly (duh!) subscribes to all substreams at the same time, concurrently. However this operator makes sure that results from the first substream are propagated first, even if it's not the first one to complete. An example will quickly reveal what this means:

We initiate four HTTP requests instantly. From the log output we clearly see that the population of London was returned first. However the subscriber did not receive it because population of Warsaw didn't arrive yet. By coincidence Warsaw completed second so at this point the population of Warsaw can be passed downstream to a subscriber. Unfortunately population of London must wait even more because first we need a population of Paris. Once Paris (immediately followed by Madrid) completes, all remaining results are passed downstream.

Notice how population of London, even though available, must wait dormant until Warsaw and Paris complete. So is concatMapEager() the best possible operator for concurrency? Not quite. Imagine we have a list of thousand cities and for each one we fetch a single 1MB picture. With concatMap() we download pictures sequentially, i.e. slowly. With flatMap() pictures are downloaded concurrently and processed as they arrive, as soon as possible. Now what about concatMapEager()? In worst case scenario we can end up with concatMapEager() buffering 999 pictures because picture from the very first city happens to be the slowest. Even though we already have 99.9% of the results we cannot process them because we enforce strict ordering.

Which operator to use?

flatMap() should be your first weapon of choice. It allows efficient concurrency with streaming behavior. But be prepared to receive results out-of-order. concatMap() works well only when provided transformation is so fast the sequential processing is not a problem. concatMapEager() is very convenient, but watch out for memory consumption. Also in worst case scenario you may end up sitting idle, waiting for very few responses.

Appendix: configuring Retrofit2 client

The GeoNames service interface that we used throughout this article in fact looks like this:

The implementation of non-default method is auto-generated by Retrofit2. Notice that populationOf() returns a one-element Flowable<Long> for simplicity's sake. However to fully embrace the nature of this API other implementations would be more reasonable in real world. First of all the SearchResult class returns an ordered list of results (getters/setters omitted):

After all there are many Warsaws and Londons in the world. We silently assume the list will contain at least one element and the first one is the right match. More appropriate implementation should either return all hits or even better Maybe<Long> type to reflect no matches:

FAIL_ON_UNKNOWN_PROPERTIES is often what you desire. Otherwise you have to map all fields from JSON response and your code will break when API producer introduces new, otherwise backward compatible fields. Then we setup OkHttpClient, used underneath by Retrofit:

Sometimes you can skip the configuration of OkHttp client but we added logging interceptor. By default OkHttp logs using java.util.logging so in order to use decent logging framework we must install a bridge at the very beginning:

Labels

Comments

To clarify further, both flatMap and concatMapEager let's you define the concurrency level and prefetch amount. The latter can become important with concatMapEager because even if it starts multiple sources, it prefetches only a limited amount before pausing the sources and only resumes requesting from each source once the source's turn comes up. The default prefetch is 128.

Oh yes, now I get it. What I'm referring to is the fact that flatMap() is first doing map() operation (leading to Flowable of Flowable of Long) and then mergeing (flattening). It's just an internal type, never to be seen. I'll try to clarify it.