Pages

Saturday, 9 December 2017

Generators have state

Python’s generator expressions (discussed in two previous posts) are very useful for stream programming. But sometimes, you want some state to be preserved between calls. That is where we need full Python generators.

A generator expression is actually a special case of a generator that can be written inline. The generator expression (<expr> for i in <iter>) is shorthand for the generator:

The yield statement acts somewhat like a return in providing its value. But the crucial difference is what happens on the next call. An ordinary function starts off again from the top; a generator starts again directly after the previous yield. This becomes important if the generator includes some internal state: this state is maintained between calls. That is, we can have memory of previous state on the next call. This is particularly useful for recurrence relations, where the current value is expressed in terms of previous values (kept as remembered state).

Running total

The accumulate() generator provides a running total of its iterator argument. We can write this as an explicit sum:
$$ T_N = \sum_{i=1}^N x_i$$
We can instead write this sum as a recurrence relation: $$ T_0 = 0 ; T_n = T_{n-1} + x_n$$
Expanding this out explicitly we get
$$T_0 = 0 ; T_1 = T_0 + x_1 = 0 + x_1 = x_1 ; T_2 = T_1 + x_2 = x_1 + x_2 ; \ldots$$
+++T_0+++ is the initial state, which is observed directly. The recurrence term +++T_n+++ tells us what needs to be remembered of the previous state(s), and how to use that to generate the current value.

Factorial

Similarly we can write the factorial as an explicit product:$$N! = F_N = \prod_{i=1}^N i$$
And we can write it as a recurrence relation:
$$ F_1 = 1; F_n = n F_{n-1} $$
+++F_1+++ is the initial state, which should be the first output of the generator. We can yield this directly on the first call, then (on the next call) go into a loop yielding the following values.

We can remove the special case, and instead calculate the next value after the yield in the loop. The subsequent call then picks up at that calculation.

deffact():
f = 1for n in count(2):
yield f
f *= n
print_for(fact())

1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, ...

Fibonacci numbers

The perennially popular Fibonacci numbers are naturally defined using a recurrence relation, involving two previous states:$$F_1 = F_2 = 1 ; F_n = F_{n-1} + F_{n-2}$$
This demonstrates how we can store more state than just the result of the previous yield.

Logistic map

Difference equations, discrete-time analogues of differential equations, are a form of recurrence relation: the value of the state at the next time step is defined in terms of its value at previous timesteps.

The logistic map is a famous difference equation, exhibiting a range of periodic and chaotic behaviours, depending on the value of its paramenter +++r+++. $$x_0 \in (0,1) ; x_{n+1} = r x_n(1-x_n)$$

(Here I have used s=1 to get a small point, rather than use a comma marker to get a pixel, because there is currently a bug in the use of pixels in scatter plots.)

Faster +++\pi+++

Many series for generating +++\pi+++ converge very slowly. One that converges extremely quickly is:$$
\pi = \sum_{i=0}^\infty \frac{(i!)^2 \, 2^{i+1}}{(2i+1)!}$$We could use generators for each component of the term to code this us as:

This algorithm is often referred to as the Sieve of Eratosthenes, but the true sieve is somewhat different in its operation. The sieve doesn’t require any square roots or divisions: it uses addition only, moving a marker through the numbers, striking out the multiples. Using generators we can do this lazily, using a dictionary to store which marks have struck out a particular value (for example, when we get to 15, the dictionary entry will show that it has been struck out by 3).

If there are no markers, yield a new prime +++p+++; start a new marker to strike out multiples of +++p+++ (start it at +++p^2+++, for the same reason the previous algorithm only needs to test values up to +++\sqrt n+++).

If there are markers (such as for 15), the value isn’t prime; move each marker on to the next value it strikes out (so for 15, move the 3 marker on 6 places to strike out 21: each marker is moved on by twice its value, since we are optimising by not considering even values)

Delete the current dictionary entry (so that the dictionary grows only as the number of primes found so far, rather than as the number of values checked, speeding up dictionary access).

%time [ i for i in islice(primessqrt(),100000) if i < 0 ]
%time [ i for i in islice(primesieve(),100000) if i < 0 ]

Walltime: 4.69sWalltime: 735ms

So full Python generators are even more fun than generator expressions!
And in fact, there’s yet more fun to be had with generators, because it is possible to send values to them on each call, too; but that’s another story for another time.

No comments:

Post a Comment

about me

an infrequent blogger

This is an eclectic collection of stuff that I might want to remember, but doesn't fit well into my more formal website. That is where I collect my sf and technical reviews, publications, and other less ephemeral things. Also see my Google+ page for bits and bobs.