More LINQ with System.Interactive – Sequences under construction

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about constructor operators provided by EnumerableEx:

Constructing sequences

In order to perform operations over sequences using various combinators and operators, it’s obviously a prerequisite to have such sequences available. While collection types in the .NET Framework implement IEnumerable<T> (or the non-generic counterpart, bridgeable to LINQ using the Cast<T> Standard Query Operator), one often wants to construct sequences on the spot. Moreover, sequences often should have a lazy nature as their persistence in memory may be problematic or infeasible (infinite sequences). For all those reasons, constructor operators come in handy.

LINQ to Objects already has a constructor function called Enumerable.Range to produce a sequence with a integral numbers starting from a certain value, returning the asked amount of numbers lazily:

The lazy nature should not be underestimated, as one could create infinite sequences representing the potential to produce a certain (ordered) set of objects. When combined with other restriction operators it becomes possible to use composition to limit the produced results in a manner very close to the domain we’re talking about. For example, positive natural numbers are integer numbers larger or equal to zero. Numbers starting with 5 are the numbers, capped by means of a Skip operation or something similar. Taking a number of elements can be done using Take. Without deviating too much from our today’s blogging mission, here’s what I’m alluding to:

I’ll leave it to the reader as a challenge to come up with ways to optimize this in a variety of ways whilst preserving the declarative nature on the use site (i.e. make the sarcastic “Good luck” go away).

Back to Rx: in today’s installment we’ll look at various constructor functions in EnumerableEx.

Return and the cruel return of the monad

The simplest constructor function is Return, simply yielding the single value specified on demand. It’s similar to a one-element array and that’s about it from a practical point of view:

public static IEnumerable<TSource> Return<TSource>(TSource value);

You should be able to guess the implementation of the operator for yourself. Use is straightforward as shown below:

EnumerableEx.Return(42).Run(Console.WriteLine);

One interesting thing about this constructor function is its signature, going from TSource to IEnumerable<TSource>. This is nothing but the return function (sometimes referred to as unit) used on a monad, with a more general signature of T to M<T>, the little brother to the bind function which has signature M<T> –> (T –> M<R>) –> M<R>, also known as SelectMany in LINQ. The triplet (known as a Kleisli triple) of the type constructor M (in LINQ the particular cases of IEnumerable<T> and IQueryable<T> are used, i.e. not a general type constructor), the unit and bind function form a monad.

Throw me an exception please

Another singleton constructor is the Throw function that we’ve seen repeatedly in the previous post on exception handling over sequences. Its role is to provide an enumerable that will throw an exception upon the first MoveNext call during enumeration:

In fact, this is a lazily thrown exception constructor. Use is simple again:

EnumerableEx.Throw<int>(new Exception()).Run();

Notice you got to specify the element type for the returned (never-yielding) sequence as we’re constructing an IEnumerable<T> and there’s no information to infer T from. Obviously, the resulting sequence can be combined with other sequences of the same type in various places, e.g. using Concat. Below is a sample of how to use the Throw constructor with SelectMany to forcefully reject even numbers in a sequence (rather than filtering them out):

Here we use the conditional operator to decide between an exception throwing sequence or a singleton element sequence (in this case, “Many” in “SelectMany” has “Single” semantics).

Empty completing the triad

Since the introduction of LINQ in .NET 3.5 (thanks to reader Keith for reminding me about my heritage), there’s been an Empty constructor as well, with the following signature and implementation:

public static IEnumerable<TSource> Empty<TSource>()
{
yield break;
}

There seems little use for this though I challenge the reader to use this one to build the Where operator using SelectMany. In fact, the reason I say “for completeness” is illustrated below:

StartWith = Snoc (or Cons in disguise)

People familiar with LISP, ML, Scala, and many other functional languages, will know the concept of cons by heart. Cons is nothing but the abbreviation for “construct” used to create a bigger list (in LISP lingo) out of an existing list and an element to be prepended:

(cons 1 (cons 2 nil))

The above creates a list with 1 as the head and (cons 2 nil) as the tail, which by itself expands into a cell containing 2 and a tail with the nil (null) value. The underlying pair of the head value and tail “reference” to the tail list is known as a cons cell. Decomposition operators exist, known as car and cdr (from old IBM machine terminology where cons cells were realized in machine words consisting of a so called “address” and “decrement” register, explaining the a and d in car and cdr – c and r stand for content and register respectively):

(car (cons 1 2)) == 1
(cdr (cons 1 2)) == 2

The StartWith operator is none other than Cons in reverse (sometimes jokingly referred to as “Snoc” by functional programmers):

Focus on the second one first. See how the “first” parameter is taken in as the second argument to StartWith. The reason is it’d be very invasive to put the extension method this parameter on the “first” parameter, as it would pollute all types in the framework with a “Cons” method:

Generate is your new anamorphism

Generate is the most general constructor function for sequences you can imagine. It’s the dual of Aggregate in various ways. Where Aggregate folds a sequence into a single object by combining elements in the input sequence onto a final value in a step-by-step way, the Generate function unfolds a sequence out of a generator function also in a step-by-step way. To set the scene, let’s show the power of Aggregate by refreshing its signature and showing how to implement a bunch of other LINQ combinators in terms of it:

Given a seed value and a function to combine an element of the input sequence with the current accumulator value into a new accumulator value, the Aggregate function can produce a result that’s the result of (left-)folding all elements in the sequence one-by-one. For example, a sum is nothing but a left-fold thanks to left associativity of the numerical addition operation:

1 + 2 + 3 + 4 + 5 = ((((1 + 2) + 3) + 4) + 5)

The accumulated value is the running sum of everything to the left of the current element. Seeing the elements of a sequence being eaten one-by-one is quite a shocking catastrophic event for the sequence, hence the name catamorphism. Below are implementations of Sum, Product, Min, Max, FirstOrDefault, LastOrDefault, Any and All:

As the dual to catamorphisms we find anamorphisms, where one starts from an initial state and generates elements for the resulting sequence. I leave it to the reader to draw parallels with others words starting with ana- (from the Greek “up”). The most elaborate signature of Generate is shown below:

To see this is the dual to Aggregate, you got to use a bit of fantasy, but you can see the parallels. Where Aggregate takes in an IEnumerable<TSource> and produces a TResult, the Generate function produces an IEnumerable<TResult> from a given TState (and a bunch of other things). On both sides, there’s room for an initial state and a way to make progress (“func” versus “iterate”) both staying in their respective domains for the accumulation type (TAccumulate and TState). To select the result (that will end up in the output sequence), the overload above allows to produce multiple TResult values to be returned per TState. And finally, there’s a stop condition which is implicit in the case of a catamorphism as the “remaining tail of sequence is empty” condition can be used for it (i.e. MoveNext returns false).

Another way to look at Generate is to draw the parallel with a for loop’s three parts: initialization, termination condition, update. In fact, Generate is implemented as some for-loops. More signatures exist:

We’ll discuss the ones with Notification<T> types in the next episode titled “Code = Data”, but the remaining three others are all straightforward to understand. Some lack a terminating condition while others lack the ability to yield multiple results per intermediate state. Below is a sample of Generate to produce the same results as Enumerable.Range:

Defer what you can do now till later

The intrinsic lazy nature of sequences with regards to enumeration allows us to push more delayed effects into the sequence’s iteration code. In particular, the construction of a sequence can be hidden behind a sequence of the same type. Let’s show a signature to make this more clear:

In here, an IEnumerable<TSource> is created out of a factory function. What’s handed back from the call to Defer is a stub IEnumerable<TSource> that will only call its factory function (getting the real intended result sequence) upon a triggered enumeration. An example is shown below:

In here, the Factory message won’t be printed till something starts enumerating the xs sequence. Both calls to Run do so, meaning the factory will be called twice (and could in fact return a different sequence each time).

Next on More LINQ

More duality, this time between “code and data” views on sequences, introducing Notification<T>.