Sun 18 September 2016

This post is an extended version of a lightning talk I gave at*PyConUK 2016*on 2016-09-17

I want share with you an interesting connection between two programming
paradigms that, at first glance, seem completely different. This is, of
course, not my
idea
but material on the web is relatively difficult to come by, particularly
if you are not into theory. This treatment tries to be more approachable
and look from the Python perspective.

So what is an iterator?

As Python programmers we feel we've got a pretty good understanding of
iterators. They are a stalwart of imperative programming, allowing us to
express many tasks efficiently as loops rather than gathering sets of
things into intermediate data structures. Abstractly we have objects
that can be iterated over, which we will call iteratables and objects
that represent the process of iterating called iterators

iter(iterable)->iteratornext(iterator)->value

The point being that you can continually call next(iterator) to
yield multiple values. Also, if you call iter(iterable) a second
time you will get a fresh iterator which will yield values independently
(usually, but not necessarily, the same values).

If we are going to perform surgery on this definition we need to be a
little more precise. We need to capture all the possible results of
calling these functions. In particular, calling next() have several
outcomes depending on whether it raises an exception and why.

We can simplify this notation by removing our names to just leave an
expression of what is happening. Both iterable and iterator
interfaces have only one method, so we can replace them with single
functions:

iterable=()->iteratoriterator=()->(value|StopIteration|Exception)

resulting in the expression

()->(()->(value|StopIteration|Exception))

Inside out?

We are now in a position to perform our manauver. All we have to do is
reverse the arrows, turning inputs into outputs and vica versa. Since
we have a higher order function in here the result will be a little more
subtle than just going backwards.

Our original expression

()->(()->(value|StopIteration|Exception))

becomes

==>()<-(()<-(value|StopIteration|Exception))

or rewriting it left to right

==>((value|StopIteration|Exception)->())->()

But what does this mean? We have a function that takes another
function as argument, that second function takes either a value,
StopIteration or Exception.

To make what is going on a little more explicit we can notice that any
function which takes one of several types could be replaced by a tuple
of functions taking each type. In our case we can split our inner
function into 3 separate functions:

These functions look a bit odd because they don't return anything, they
only receive a value or, in the case of StopIteration, there is no
need to pass the value at all because there is only one possible reason
for calling it. Presumably, something is done to these values though, in
that they trigger some sort of computation.

In fact, they are exactly like event handlers. What we have done is
replaced an object which allows us to pull values out of it until there
are no more values left with an object which you send values until there
is an error or there are no more values.

Finally, lets put the names back that we removed earlier. Clearly we
don't have iterators/iterables any more so we'll need some different
names. What we want is observer/observable:

Observables came out of C# and LINQ and was subsequently been ported to
the JVM world and many other languages, inluding JavaScript and
Python. They have recently
become popular for creating event-driven web apps and are central to
events in Angular 2. They can be a
higher-level abstraction than Promises supporting a wealth of operators
to combine and filter streams of events.

But they are just iterators inside out!

A full discussion of the power of Rx would require it's own blog post
but I should include at least one example in Python. Consider it a
Hello World for Rx.

Why care?

Whether you are developing dynamic web applications or large-scale data
pipelines, we are being confronted with new paradigms that should solve
our problems. Events, Streams, Asynchronous, Reactive -- all these
things are undoubtably good but how are we going to start using them
with our existing business logic? One thing we are told is that async
and blocking code do not mix and yet everyone knows and understands
imperative, iterator-style code.

It's good to realise events and iterators are not so different. One is
simply the dual of the other. This changes our perspective allowing us
to design our architecture differently. Producers are in control rather
than consumers.

I am just beginning to explore the possibilities of Rx. It's clearly
useful in JavaScript web apps but what about data pipelines? What about
Python?

Sun 24 April 2016

Back when Big Data was a genuinely new concept and Data Scientist's
weren't rock stars, the SciPy ecosystem was
beginning to mature. It felt as though Python deserved to dominate the
data analysis world but uptake was still slow outside the realm of the
physical sciences and engineering ...

Wed 18 June 2014

CEDA was a fairly early adopter of Django, introducing it as the technology for the CMIP5 Questionnaire. Back then we were on Django 1.2 and there have been 3 major releases since then. We have now replaced most of our internal systems with Django apps and it has been ...

Thu 14 March 2013

I have a new blogging engine to play with. This time I'm going with Pelican which generates static HTML, making it very easy to redeploy. So far I've been stunned at how quickly I can pull content into this blogging system. In particular it was very easy to ...

Wed 06 February 2013

This notebook describes the analysis of a snapshot of all replicas in
the CMIP5 archive on February 6th 2013. A list of all dataset versions
at each of the 3 replicating data centres is read, analysed and compared
with the total number of datasets listed in the CMIP5 archive.

Tue 15 January 2013

I am increasingly excited about distributed version control and how it enables easy collaboration between software developers without technical and social barriers such as synchronisation of work and maintenance of control.</p>

The obvious question is how the DVC systems can be applied to scientific collaboration and my particular specialism ...