2015. október 21., szerda

ConnectableObservables (part 2)

Introduction

In the previous post, I've shown how one can write a "simple" ConnectableObservable that uses a Subject to dispatch events to subscribers once it has been connected.

The shortcoming of the solution is that there is no request coordination and everything runs in unbounded mode: the developers have to apply onBackpressureXXX strategies per subscriber, however, that leads to either dropping data or buffer bloat.

If the underlying Observable is cold, there should be a way to make sure it emits only as much elements as the child subscribers can process. To achieve this, we need request coordination.

Request coordination

So far, the operators we were implementing had to deal with a single child subscriber and its request at a time. One had to either pass it through, rebatch it or accumulate it, based on the business logic of said operator.

When there are multiple child Subscribers, the problem space suddenly receives a new dimension. What are the new problems?

Every bit counts

First, different child subscribers may request different amounts. Some may request small amounts, some may request larger amounts and others may want to run in unbounded mode (i.e., request(Long.MAX_VALUE)). In addition, the request calls may happen any time and with any amount.

Given such heterogeneous request pattern, what should be the request amount sent to the upstream Observable source?

There are two main options:

request as much that the smallest child Subscriber requested and

request as much as the largest child Subscriber requested.

Option 1) is essentially the lockstep approach. Its benefit is that there is no no need for request re-batching and buffering since once the upstream emits, everybody can receive it immediately. (Rebatching and buffering is an option in case the request amounts are really 1s or 10s at a time.) The drawback is that the whole setup slows down to the slowest child Subscriber, which if "forgets" to request, nobody gets anything.

Option 2) gives more room to individual child Subscribers and allows them to run on their own pace. However, this solution requires unbounded buffering capability (which may be shared or per each Subscriber). This means if there is an unbounded child Subscriber, the operator has to request Long.MAX_VALUE and fill the buffers for everyone. This, depending on the operator, may be of no problem though.

Subscribers may come and go at will

The second problem is that the the number of Subscribers may not be constant: new subscribers arrive, old ones leave. This poses another set of problems:

A child Subscriber may request Long.MAX_VALUE then leave after a few (or no) elements.

A child Subscriber may arrive but not request anything, stopping everyone else.

A child Subscriber may leave at any time and thus its request amount "pressure" has to be released.

All child Subscribers leave before the upstream Observable completes. What should happen in this case?

Within the lockstep approach, two sub-options arise. Either one has to introduce some bounded buffers that will hold onto the requested amounts, which now has to be re-batched to fit in, and simply await the new Subscribers. Otherwise, one has to slowly "drip" away the source values until a child Subscriber arrives.

Within the unbounded buffering approach, one can simply keep buffering or again, start dropping values.

Approaches taken in RxJava

RxJava has two operators that return a ConnectableObservable: publish() and replay(). For a long time, these were ignoring backpressure completely and behaved just like the MulticastSupplier in the previous part.

These operators were rewritten to support backpressure (in 1.0.13 and 1.0.14 respectively) and had to take the problems mentioned before into account. The solutions were as follows

Operator publish() does lockstepping with a fixed prefetch buffer: the buffer is only drained (and then replenished) if all known child Subscribers can take a value. If there are no child Subscribers, it "slowly drips away" it source, which means it starts to request 1 by 1 and drops these values.

Operator replay() does unbounded buffering. The reason for this is that both the bounded and unbounded version of replay() has to buffer and replay all values from the upstream anyway. You may think, why buffer everything when the replay is time and/or size bound. The answer is that these operators, similar to Subjects, have to deliver events continuously and without skips; if there is an child Subscriber that arrived at some time, requested 1 then went to "sleep", the next time it requests the bounded replay has to present the next value, no matter how far ahead the other Subscribers went in the meantime.

The effect of disconnection

There is a problem that isn't dealt with in the RxJava operators but has to be mentioned. If one unsubscribes the Subscription returned by the connect() method, the upstream will stop sending further events.

The problem is that this may leave the child Subscribers hanging: they won't receive any further events (beyond those that are already in some buffer of the respective operator). We have similar problems with CompletableFutures in Java 8. One can cancel a Future but what happens to those that were awaiting its result?

The solution in Java 8 is to emit a CancellationException as the result in this case so that the dependent computations can terminate. However, this isn't the case with RxJava (in both 1.x and 2.x branches). The current implementation will just hang the child Subscribers.

This problem may appear outside of a ConnectableObservable as well. For some time, the RxAndroid 0.x library contained an operator that were applied to all sequences and unsubscribed them if the lifecycle required cleanup. The problem was that this left child Subscribers without termination events. I suggested emitting an onError and onCompleted event for this case. There was no resolution of the problem and the operator was removed before 1.0.

On a personal note, I don't remember anyone from the community complaining about this problem and it seems nobody is really affected by this behavior. As with many obscure and corner cases, if I don't mention them, nobody else seems to discover them.

The effect of termination

Upstream Observables may terminate normally, in which case the ConnectableObservable will emit the terminal event to child Subscribers.

At this point, a new Subscriber may subscribe to the terminated ConnectableObservable. What should happen in this case? Does the termination also mean disconnection? Should the child Subscriber get terminated instantly, similar to PublishSubject?

Again the solution requires business decision. RxJava chose the approach that a terminal event sent to a ConnectableObservables is considered a disconnect event and late coming Subscribers won't receive any terminal event but will be remembered until another call to connect() happens.

This has the benefit that the developers can "prepare" child Subscribers before the upstream Observable gets run and thus avoid losing events. The drawback is that one has to remember to call connect() again, otherwise nothing runs and the Subscribers are left hanging.

Family of collectors and emitters

Before we jump into some code, I'd like to sketch out a pattern that is the foundation of almost all operators that deal with either multiple sources or multiple child Subscribers.

I've written dozens of such operators and I've noticed they all use the same set of components and methods:

They all need to track Subscribers, either the child Subscribers or the Subscribers that are subscribed to the source Observables. The tracking structure uses the copy-on-write approach of array-based resource containers.

They all use an emitter loop (synchronized) or drain loop (atomics) which has to be triggered from many places: when an event is emitted from upstream(s), when a new child Subscriber arrives, when a request comes from child Subscribers and sometimes when a child unsubscribes.

The loop has some preprocessing step: figuring out where the Subscribers are at the moment, selecting which source to drain or combining available values from sources in some fashion

Finally, the events are delivered to Subscriber(s) and replenishments are requested from source Observable(s).

Which operator?

Now that we are aware of the problems, let's implement a ConnectableObservable which does request coordination.

I've been thinking what operator to implement. My first thought was to show how to implement the operator pair of an AsyncSubject or BehaviorSubject (similar to how publish() is the pair of PublishSubject), however, the former can be implemented using plain composition plus replay():

Implementing the pair of BehaviorSubject is a bit more involved. The naive implementation would use composition such as this:

public ConnectableObservable<T> behave() {
return replay(1);
}

However, this doesn't properly capture the behavior of a terminated BehaviorSubject: child Subscribers get nothing but a terminal event whereas replay will always replay 1 value and 1 terminal event after it completed.

To minimize brain melting, I'm not going to show how to implement a variant of the least complex of the operators: publish().

Publish (or die)

First, let's sketch out all the requirements we want to achieve:

The operator should do a lockstep-based request coordination with prefetching (for efficiency)

The effect of disconnection on the child Subscribers should be parametrizable: no event, signal error or signal completion.

The operator should be considered terminated and new subscribers will wait for the next connect().

The operator will allow errors to cut ahead. (Implementing error-delay is an excercise left to the reader).

The operator will use a power-of-2 prefetch buffer.

With these requirements, we start with the skeleton of the class as usual:

The state object will handle the connection, subscription and reconnection cases:

Because we have to reconnect, we store the current connection in an AtomicReference.

We initialize the source and strategy fields and set up an initial unconnected connection.

The method call() from OnSubscribe will handle the subscribers; I'll show the implementation further down.

The connect method will handle the connection attempts; I'll show the implementation further down.

Finally, once a connection has been terminated on its own or via unsubscribe, we have to replace the old connection with a fresh connection atomically and not overwriting somebody else's fresh connection due to races.

Before going deep into the complicated logic, two more simplistic classes remain. The first is the Subscriber that will be subscribed to the source Observable:

We need to know about what connection this class has to deal with for two reasons: 1) it has to notify the connection the underlying Subscriber can receive values, 2) if the subscriber unsubscribes, it may mean the other Subscribers can now receive further values.

Since request() runs asynchronously, the connection might not be available yet. We have to remember to call drain() once this connection becomes available (shown later on).

Since unsubscribe() runs asynchronously as well, it has check for non-null and only remove itself from the array of subscribers (shown later on). Note also the idempotence provided by once.

The class has to manage a set of state variables: the current array of Subscribers, the value queue plus the terminal event holders, the connection and disconnection indicators, the work counter for the queue-drain approach, the Subscriber that is subscribed to the Observable and finally the EMPTY and TERMINATED array indicators.

The constructor initializes the various fields.

The subscriber needs some preparations besides creating a new SourceSubscriber, therefore, I factored it out into a separate method.

The copy-on-write handling of the known subscribers is done via add and remove, similar to how we did this with Subjects and with the array-backedSubscription container.

We will handle the source events with these onXXX methods.

Finally, the drain and termination check methods for the queue-drain approach.

The meltdown

So far, the classes and those methods implemented were nothing special. However, the real complexity starts from here on. I'll show the missing implementations one by one and mention the concurrency considerations with them as well..

I suggest you take a small break, drink some power-up, clear your head at this point.

Done? All right, let'd do this.

State.call

This method is responsible for handling the incoming child Subscribers. The method has to consider that the connection may terminate on its own or get disconnected concurrently:

First, we create a PublishProducer and set it on the subscriber to react to requests and unsubscription.

Next, we retrieve the current known connection and set it on the PublishProducer so it can call the drain() method if it wishes.

We attempt to add the PublishProducer to the internal tracking array. If this fails, it means the current connection has terminated and we have to try the next connection (once becomes available) by looping a bit.

Even if the add succeeded, the child might have just unsubscribed and thus the remove might not have found it. By calling it here again, we can make it sure the PublishProducer doesn't stay in the array unnecessarily.

Once the add succeeded, we have to call drain since a concurrent call in PublishProducer might have not seen a non-null connection and couldn't notify the connection for more values (or about unsubscription). The call will make sure this PublishProducer is handled as necessary.

State.connect

This method is responsible for triggering a single connection on an unconnected Connection instance and/or return the Subscription that let's an active Connection get unsubscribed.

This method is also racing with a termination/disconnection and as such, it has to take them into account when attempting to establish a fresh connection.

It works by first retrieving the current connection and if the current thread is the first, switch it into a connected state. If successful, the doConnect method is called which will do the necessary subscription work.

Otherwise, check if the current connection is unsubscribed. If not return it to the callback. Note that there is a small window here where the current connection is determined active but may become disconnected/terminated when the method is called. Resolving this issue requires either blocking synchronization between termination and connection or other serialization approach. In practice, however, this is rarely an issue and can be ignored.

Finally, if the current connection is disconnected, let's replace it with a fresh, not-yet connected Connection and try the loop again.

Connection.createParent

The method constructs a SourceSubscriber and sets it up to behave according to the disconnection strategy:

The method will instantiate a SourceSubscriber and add a Subscription to it. This subscription, depending on the disconnection strategy, will either call onCompleted, onError with a CancellationException or set the disconnect flag followed by a call to drain (the onXXX methods call drain()).

We need the disconnected flag because we can't use an isUnsubscribed check: it would always skip the terminal event and appear as if we'd have the NO_EVENT strategy.

Connection.add, Connection.remove

The algorithms for adding and removing resources to an array-based container with copy-on-write semantics should be quite familiar by now. For completeness, here are the methods anyway:

The reason we have to drag the Action1 all the way here instead of calling it State.connect at (2) is that the call must happen before the actual subscription to the underlying Observable to allow synchronous cancellation.

The next method offers the value and calls drain to make sure it is delivered if possible. Note that if the queue is full, we reward it with a MissingBackpressureException and unsubscription; it means the upstream doesn't handle backpressure well or at all.

Since we may receive an error as part of the upstream event as well as a disconnection event, we heed an AtomicReference and set only one of them as the terminal event. In this example, the first one wins, the other gets printed to the console. If the CAS succeded, we set the done flag and call drain to handle things.

It is true onCompleted can also be called from two places, but since it just sets the done flag to true, there is no need for any CAS-ing here. It is also true that due to the disconnection strategy, the onError and onCompleted can race with each other. However, since the difference of handling them is just that error contains null or not, it is't really a problem. Note also that since we used unsafeSubscribe in onConnect, there shouldn't be any call to the SourceSubscriber.unsubscribe coming from upstream and causing trouble if the source terminated normally and the disconnection strategy happen to be SEND_ERROR.

Connection.drain

This is unquestionably the heart of the operator and the most complicated logic due to the effects of concurrently changing values it has to rely on. I'll explain it in piece by piece:

First, it contains a familiar drain loop with wip counter and missed count:

Nothing fancy yet. The wip counter doubles as the serialization entry point on a 0 - 1 transition and a missed counter above that.

If inside the loop, the first thing to do is to check for a terminal condition via checkTerminated (explained later). It checks for the terminal events and disconnected state and acts accordingly. This is done before the upcoming request coordination since terminal events are not subject to backpressure management and can be emitted before any child Subscriber requests anything.

The next step is to perform request coordination. Since we set out to do a lockstep coordination, we have to ask all known child subscribers for their current requested amount and figure out the minimum amount everybody can receive. Note that this can be zero.

We have to check if the queue is non empty and consume a value with a single poll() then we ask for replenishment. Note that the "slowness" depends on the speed of the upstream Observable. If one decides to do nothing if there are no subscribers, the if statement can be simplified to if (n != 0) { } but should not be removed!

If we know there are any subscribers and we know the minimum requested amount, we can try draining our queue and emit that amount to everybody.

This should also look familiar. We check the terminal conditions again (1) (optional if you want to be eager). Next, we loop until the minRequested is zero or the queue becomes empty. Inside the loop we do the usual termination checks (2) and emission accounting (3). After the loop, if there were emissions, we ask for replenishment from the SourceSubscriber instance (4).

Lastly, the final piece of the drain method is the publication of each value to all subscribers:

The method takes only a done and an empty indicator but not any individual Subscriber or the array of known subscribers.

Since the disconnected flag is set only if the disconnection strategy was NO_EVENT, we can't do much but just set in the TERMINATED indicator array. Anybody unlucky enough still subscribed won't get any further events.

If the done flag is true and there is an error we first replace the current connection with a fresh one (within the state) so newcommers won't try to subscribe to a terminated connection.

After clearing the queue for any normal values, we swap in the TERMINATED indicator array so ...

... anybody who got in can now receive its terminal event and the drain loop will quit.

The same logic applies in the case when the upstream has completed normally and the queue has become empty.

Testing it out

Finally, we reached the end of one of the most complicated operators in history of RxJava. Now let's reward us via a small unit test to see if the backpressure and the disconnection stategy really works:

It should print [1, 2, 3, 4, 5] to the console and quit without any AssertionErrors. Neat, isn't it?

Conclusion

In this lenghtly and brain-stretching blog post, I've explained the requirements and problems around ConnectableObservables that want to do request coordination between its child Subscribers and its upstream Observable. I then showed an implementation of a publish() like ConnectableObservable which features disconnection strategy to avoid hanging its child Subscribers.

This is, however, not the most complicated operator in RxJava. It isn't replay(), even though the bounded version is a bit more complicated than the PublishConnectableObservable (but only due to the boundary management). It is not the most commonly used operator either and in fact, that is simpler due to fewer state-clashing. No, the most complicated operator to day has so intertwined request coordination that even I'm not sure it is possible to write a buffer-bounded version of it.

But enough of mysterious foreshadowing! In the next part, I'm going to detail what it takes to implement a replay()-like ConnectableObservable.