Sunday, June 27, 2010

STOP THE PRESS! This series has now been superseded by the online book www.IntroToRx.com. The new site/book offers far better explanations, samples and depth of content. I hope you enjoy!

So far in the series of posts we have managed to avoid any explicit usage of threading or concurrency. There are some methods that we have covered that implicitly will be introducing some level of concurrency to perform their jobs (e.g. : Buffer, Delay, Sample etc all require a separate thread to do their magic). However most of this has been kindly abstracted away from us. This post will look at the beauty of the Rx API and its ability to effectively remove the need for WaitHandles, and any explicit calls to using Threads, the ThreadPool and the new shiny Task type.
A friend of mine once wisely stated that you should always understand at least one layer below what you are coding. At the time he was referring to networking protocols, but I think it is sage advice for all programming. On the current project I am working on there are some very savvy developers that are very comfortable working in a multithreaded environment. The project has client and server side threading problems that we have had to tackle. I believe the whole team would agree that it has bee amazing that amount of concurrency that Rx will handle for you in a declarative way. The code base is virtually free of WaitHandles, Monitor or lock usage, or any explicit creation of threads. This has evolved into this state over time as we have come to grips with the power of Rx and the end result is far cleaner code. However, having the experience on the team allowed us to find out ways we should and shouldn’t be using Rx which would have been just too hard for me to do alone.
Getting back to my friend’s comment about understanding the underlying subsystem, this is especially important when dealing with Rx and scheduling. Just because Rx abstracts some of this away, it does not mean that you cant still create problems for yourself if you are not careful. Before I scare you too much let’s look at some of the Scheduling features of Rx.

Scheduling

In the Rx world, you can control the scheduling of two things

The invocation of the subscription

The publishing of notifications

As you could probably guess these are exposed via two extension methods to IObservable<T> called SubscribeOn and ObserveOn. Both methods have an overload that take an IScheduler and will return an IObservable<T> so you can chain methods together.

The IScheduler interface is of less interest to me than the types that implement the interface. Depending on your platform* (Silverlight3, Silverlight4, .Net 3.5, .Net 4.0) you will be exposed appropriate implementations via a static class Scheduler. These are the static properties that you can find on the Scheduler type that expose different schedulers. Scheduler.Dispatcher will ensure that the actions are performed on the Dispatcher, which is obviously useful for Silverlight and WPF applications. You can imagine that the implementation for this would just delegate any calls to ISchedule(Action) straight to Dispatcher.BeginInvoke(Action)Scheduler.NewThread will schedule all actions onto a new thread.Scheduler.ThreadPool will schedule all actions onto the Thread Pool.Scheduler.TaskPool (which is only available to Silverlight 4 and .NET 4.0) will schedule actions onto the TaskPool.Scheduler.Immediate will ensure the action is not scheduled but is executed immediately. Scheduler.CurrentThread just ensures that the actions are performed on the thread that made the original call. This is different to Immediate, as CurrentThread will queue the action to be performed. Note the difference in the output of the following code. One method passes Scheduler.Immediate, the other passes Scheduler.CurrentThread.

*Sorry Rx for JavaScript, I have not even opened the box on you and don’t know anything about scheduling in JavaScript.

Examples

So they are each of our Schedulers, lets see some of them in use. The think I want to point out here is that the first few times I used these overloads I had them confused as to what they actually did. You should use the SubscribeOn method to describe how you want any warm up and background processing code to be scheduled. ObserveOn method is used to describe where you want your notification scheduled to. So for example, if you had a WPF application that used Rx to populate and ObservableCollection<T> then you would almost certainly want to use SubscribeOn with one of the Threaded schedulers (NewThread, ThreadPool or maybe TaskPool) and then you would have to use the Dispatcher scheduler to update your collection.

So all of the schedulers just offer a nice abstraction to us to utilise the various ways we can write concurrent code. Besides saving me from having to write the tedious code to get code onto a new thread or thread pool it also makes Rx threading easy. Oh Rx, you thought I had forgotten. I didn’t think that any of the schedulers except Current & Immediate warranted a further explanation but, I do think it is worth pointing out some of the “fun” threading problems you can face even though the scheduling has been abstracted away from you.

Deadlocks

When writing the current application my team is working on we found out the hard way that Rx code can most certainly deadlock. When you consider that some calls (like .First() ) are blocking, and that we can schedule work to be done in the future, it becomes obvious that race condition can apply. This example is the most simple deadlock I could think of. It is fairly silly but it will get the ball rolling.

var stream = new Subject<int>();
Console.WriteLine("Next line should deadlock the system.");
var value = stream.First();
stream.OnNext(1);
Console.WriteLine("I can never execute....");

Hopefully we wont ever write code that silly, and if we did our tests would give us fairly quick feed back that things were wrong. What lets deadlocks slip into the system is when they manifest themselves at integration points. This example may be a little harder to find but is only small step away from the silly 1st example. Here we block in the constructor on a UI element which will always be created on the dispatcher. The blocking call is waiting for an event, that can only be raised from the dispatcher – deadlock.

This odd implementation with explicit scheduling will cause the 3 OnNext calls to be scheduled once the .First() call has finished, which is waiting for an OnNext to be called – Deadlock.
So far this post has been a bit doom and gloom about scheduling and the problems you could face, that is not the intent. I just wanted to make it obvious that Rx was not going to solve the age old concurrency problems, but it will make it easier to get it right if you follow this simple rule.

Only the final subscriber should be setting the scheduling.

Avoid using .First() –Ed: that is for you Olivier. We will cal this rule 1b

Where the last example came unstuck is that the service was dictating the scheduling paradigm when really it had no business doing so. Before we had a clear idea of where we should be doing the scheduling in my current project, we had allsorts of layers adding “helpful” scheduling code. What it ended up creating was a threading nightmare. When we removed all scheduling code and then located it in a single layer (at least in the Silverlight client) most of our concurrency problems went away. I recommend you do the same. At least in WPF/Silverlight applications, the pattern should be simple: “Subscribe on a Background thread; Observe on the Dispatcher”.
So my challenge to the readers is to add to the comments:

Any other scheduling rules (2 seems quite small, and I was only going to have 1)

Post some nasty Rx race condition code

What rules do you have for Subscribing on the background thread? Which Scheduler should I use and when i.e. NewThread, ThreadPool & TaskPool. – and I come full circle about understanding one layer below that to which you are working.

Saturday, June 19, 2010

STOP THE PRESS! This series has now been superseded by the online book www.IntroToRx.com. The new site/book offers far better explanations, samples and depth of content. I hope you enjoy!

In the last post we covered some of the flow control features of Rx and how to conceptualise them with Marble diagrams. This post will continue to build on those concepts by looking at different ways of working with multiple streams.
The Concat extension method is probably the most simple extension method. If you have covered the previous flow control post then most of the error handling constructs are more complex than this method. The method will simple publish values from the second stream once the first stream completes.

If either stream was to OnError then the result stream would OnError too. This means that if stream1 produced an OnError then stream2 would never be used. If you wanted stream2 to be used regardless of if stream1 produced an OnError or not then the extension method OnErrorResumeNext would be your best option.
Quick Video on Concat, Catch and OnErrorResume next on Channel9.
The Amb method was a new concept to me. I believe this comes from functional programming and is an abbreviation of Ambiguous. Effectively this extension method will produce values from the stream that first produces values and will completely ignore the other stream. In the examples below I have 2 streams that both produce values. In the first example stream1 will win the race and the result stream will be stream1’s values. In the second example, I delay the stream1 from producing values so stream2 will win the race and the result stream will be the values from stream2.

The Merge extension method does a primitive combination of multiple streams where they implement the same type of T. The result will also be an IObservable<T> but will have the values produced to the result stream as the occur in the source streams. The stream will complete when all of the source streams complete or when an OnError is published by any stream.

Merge also provides other overloads that allow you to pass more than 2 source observables via an IEnumerable or params arrays. The Overload that take a params array it great for when we know how many streams we want to merge at compile time, and the IEnumerable overload is better for when we dont know at compile time how many streams we need to merge.

//Create a third stream
var stream3 = Observable.Interval(TimeSpan.FromMilliseconds(100)).Take(10).Select(i => i + 200);
//Number of streams known at compile time.
Observable.Merge(stream1, stream2, stream3)
.Subscribe(Console.WriteLine);
Console.ReadLine();
//We can dynamically create a list at run time with this overload.
var streams = new List<IObservable<long>>();
streams.Add(stream1);
streams.Add(stream2);
streams.Add(stream3);
Observable.Merge(streams).Subscribe(Console.WriteLine);
Console.ReadLine();

A quick video on Merge on Channe9.SelectMany, like it’s counter part in IEnumerable<T> extension method will create the Caretisan product of the two streams. So for every item in one stream, it will give you every item in the other stream. A primitive way to think of it is a nexted for loop that creates a 2D array. If you want more info on SelectMany I will leave it to you to do a google search as this fairly well documented in the IEnumerable world.

A quick Video on SelectMany on channel9Zip is another interesting merge feature. Just like a Zipper on clothing or a bag, the Zip method will bring together two sets of values as pairs; two-by-two. Things to note about the Zip function is that the result stream will complete when the first of the streams complete, it will error if either of the streams error and it will only publish once it was a pair. So if one of the source streams publish values faster than the other stream, the rate of publishing will be dictated by the slower of the two streams.

Here are two short videos on Zip (first, second)on Channel9. Note the second video is actually incorrect, can you spot why?CombineLatest is worth comparing to the zip method. Both methods will use a function that takes a value from each stream to produce the result value. The difference is that CombineLatest will cache the last value of each stream, and when either stream produces a new value then that new value and that last value from the other stream will be sent to the result function. This example uses the same inputs as the previous Zip example but note that many more values are produced. The leaves CombineLatest somewhere between Zip and SelectMany :-)

Quick video on CombineLatest on Channel9ForkJoin like the last few extension methods also requires a function to produce the result but this will only return the last values from each stream. Things to note with ForkJoin is that like the previous methods, if either stream error so will the result stream, but if either stream is empty (ie completes with no values) then the result stream will also be empty. This example uses the same values as the previous samples and will only produce a pair from the last values from each stream once they both complete.