With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about the materialization and dematerialization operators provided by EnumerableEx:

von Neumann was right

Code and data are very similar animals, much more similar than you may expect them to be. We can approach this observation from two different angles, one being a machine-centric view. Today’s computers are realizations of von Neumann machines where instructions and data are treated on the same footage from a memory storage point of view. While this is very useful, it’s also the source of various security-related headaches such as script or SQL injection and data execution through e.g. stack overruns (Data Execution Prevention is one mitigation).

Another point of view goes back to the foundational nature of programming, in particular the essentials of functional programming, where functions are used to represent data. An example are Church numerals, which are functions that are behaviorally equivalent to natural numbers (realized by repeated application of a function, equal in number to the natural number being represented). This illustrates how something that seems exclusively code-driven can be used to represent or mimic data.

If the above samples seem far-fetched or esoteric, there are a variety of more familiar grounds where the “code as data” paradigm is used or exploited. One such sample is LISP where code and data representation share the same syntactical form and where the technique of quotation can be used to represent a code snippet as data for runtime inspection and/or manipulation. This is nothing other than meta-programming in its earliest form. Today we find exactly the same principle back in C#, and other languages, through expression trees. The core property here is so-called homo-iconicity, where code can be represented as data without having to resort to a different syntax (homo = same; iconic = appearance):

What what does all of this have to do with enumerable sequences? Spot on! The matter is that sequences seem to be a very data-intensive concept, and sure they are. However, the behavior and realization of such sequences, e.g. through iterators, can be very code-intensive as well, to such an extent that we introduced means to deal with exceptions (Catch for instance) and termination (Repeat, restarting after completing). This reveals that it’s useful to deal with all possible states a sequence can go through. Guess what, state is data.

The holy trinity of IEnumerator<T> and IObserver<T> states

In all the marble diagrams I’ve shown before, there was a legend consisting of three potential states an enumerable sequence can go through as a result of iteration. Those three states reflect possible responses to a call to MoveNext caused by the consumer of the sequence:

In the world of IObserver<T>, the dual to IEnumerator<T> as we saw in earlier episodes, those three states are reflected in the interface definition directly, with three methods:

// Summary:
// Supports push-style iteration over an observable sequence.
public interface IObserver<T>
{
// Summary:
// Notifies the observer of the end of the sequence.
void OnCompleted();
//
// Summary:
// Notifies the observer that an exception has occurred.
void OnError(Exception exception);
//
// Summary:
// Notifies the observer of a new value in the sequence.
void OnNext(T value);
}

Instead of having an observer getting called on any of those three methods, we could equally well record the states “raised” by the observable, which turns calls (code) into object instances (data) of type Notification<T>. This operation is called materialization. Thanks to dualization, the use of Notification<T> can be extended to the world of enumerables as well.

Notification<T> is a discriminated union with three notification kinds, reflecting the three states we talked about earlier:

It’s a material dual world

Materialization is the act of taking a plain enumerable and turning it into a data-centric view based on Notification<T>. Dematerialization reverses this operation, going back to the code-centric world. Thanks to this back-and-forth ability between the two worlds of code and data, we get the ability to use LINQ over notification sequences and put the result back into the regular familiar IEnumerable<T> world. A figure makes this clear:

The power of this lies in the ability to use whatever domain is more convenient to perform operations over a sequence. Maybe you want to do thorough analysis of error conditions, corresponding to the Error notification kind, or maybe it’s more convenient to create a stream of notification objects before turning them into a “regular” sequence of objects that could exhibit certain additional behavior (like error conditions). This is exactly the same as the tricks played in various other fields, like mathematics where one can do Fourier analysis either in the time of frequency domain. Sometimes one is more convenient than the other; all that counts is to know there are reliable ways to go back and forth.

(Note: For the Queryable sample, you may want to end up in the bottom-right corner, so the AsQueryable call is often omitted.)

Materialize and Dematerialize

What remains to be said in this post are the signatures of the operators and a few samples. First, the signatures:

An example of materialization is shown below, where we take a simple range generator to materialize. We expect to see OnNext notifications for all the numeric values emitted, terminated by a single OnCompleted call:

A sample where an exception is triggered by the enumerator is shown below. Notice the code won’t blow up when enumerating over the materialized sequence: the exception is materialized as a passive exception object instance in an error notification.

Starting from a plain IEnumerable<T> the grammar of notifications to be expected is as follows:

OnNext* ( OnCompleted | OnError )?

In the other direction, starting from the world of IEnumerable<Notification<T>> one can write a different richer set of sequence defined by the following grammar:

( OnNext | OnCompleted | OnError )*

For example:

var ns = new Notification<int>[] {
new Notification<int>.OnNext(1),
new Notification<int>.OnNext(2),
new Notification<int>.OnCompleted(),
new Notification<int>.OnNext(3),
new Notification<int>.OnNext(4),
new Notification<int>.OnError(new Exception()),
new Notification<int>.OnNext(5),
};

Dematerializing this sequence of notifications will produce an enumerable sequence that will run no further than the first OnCompleted or OnError:

ns.Dematerialize().Run(Console.WriteLine);

This prints 1 and 2 and then terminates. The reason this can still be useful is to create a stream of notifications that will be pre-filtered before doing any dematerialization operation on it. For example, a series of batches could be represented in the following grammar:

( OnNext* OnCompleted )*

If the user requests to run n batches, the first n – 1 OnCompleted notifications can be filtered out using some LINQ query expression, before doing dematerialization.

Finally, a sample of some error-filtering code going back and forth between IEnumerable<T> and IEnumerable<Notification<T>> showing practical use for those operators when doing sophisticated error handling:

Given some input sequences, we materialize and concatenate all of them into sequence xns. Now we write a LINQ query over the notifications to filter out exceptions, unless the exception is a critical OOM one (you could add others to this list). The result is we see 1 through 6 being printed to the screen. (Question: What’s the relationship to OnErrorResumeNext that we saw in the previous post? What’s similar, what’s different?)

BTW - your rss/atom feeds are not showing the images correctly: they're referencing the pictures on bartdesmet.net instead on bartdesmet.info where the web page is referencing them. Try reading your blog with a feed reader, and see for yourself.

This Concat overload is new too. See my subsequent post on the combinators for an explanation. In particular, Concat, Merge and Amb all have the same three overloads (IE<IE<T>>, IE<T>[] with params and one with IE<T> twice).

With Reactive Extensions for .NET (Rx) and .NET Framework 4 a new LINQ operator was introduced – Zip ( Bart De Smet gives excellent explanation about the idea and implementation details behind Zip operator ). In a nutshell it merges two sequences into