According to Wikipedia, an iterator is “an object that enables a programmer to traverse a container”. A collection of items (stashed in a container) can be thought of as being “iterable” if there is a logical progression from one element to the next (so a list is iterable, while a set is not). An iterator is then an object for moving through the container, one item at a time.

Iterators are a fundamental part of contemporary Python programming, where they form the basis for loops, list comprehensions and generator expressions. Iterators facilitate navigation of the container classes in the C++ Standard Template Library (STL). They are also to be found in C#, Java, Scala, Matlab, PHP and Ruby.

Although iterators are not part of the native R language, they are implemented in the iterators and itertools packages.

iterators Package

The iterators package is written and maintained by Revolution Analytics. It contains all of the basic iterator functionality.

> library(iterators)

Iterating over a Vector

The iter() function transforms a container into an iterator. Using it we can make an iterator from a vector.

To my mind iterating by row is likely to be the most common application, so I am a little puzzled by the fact that it is not the default. No doubt the creators had a solid reason for this design choice though!

Not surprisingly, we can iterate through the rows and columns of a matrix as well. But, taking things one step further, we will do this in parallel using the foreach library. First we set up the requisites for multicore processing.

> library(foreach)
> #
> library(doMC)
> registerDoMC(cores=4)

Then, for illustration purposes, we create a matrix of normally distributed random numbers. Our parallel loop will use an iterator to run through the matrix, computing the mean for every row. Of course, there are other ways of accomplishing the same task, but this syntax is rather neat and intuitive.

Hmmmm. The code for the loop is a little clunky and it ends in a rather undignified fashion. Surely there is a better way to do this? Indeed there is! More on that later.

Filtering with an Iterator

iter() allows you to specify a filter function which can be used to select particular items from a container. This function should return a Boolean (or a value which can be coerced to Boolean) indicating whether (TRUE) or not (FALSE) an item should be accepted. To illustrate this, we will pick out prime numbers from the first one hundred integers. The gmp package has an isprime() function which is perfectly suited to this job.

> library(gmp)

First we’ll look at the basic functionality of isprime().

> isprime(13)
[1] 2
> isprime(8)
[1] 0
> isprime(2147483647)
[1] 1

A return value of 2 indicates that a number is definitely prime, while 0 indicates a composite number. For small numbers these are the only two options, however, for larger numbers the Miller-Rabin primality test is applied, in which case a return value of 1 indicates that a number is probably prime.

An Iterable Version of split()

The native function split() accepts two arguments: the first is a vector and the second is a factor which dictates how the vector’s elements will be divided into groups. The return value is a list of vectors corresponding to each of the groups. The isplit() function accepts the same arguments and results in an iterator which steps through each of the groups, returning a list with two fields: the key (one of the levels of the factor) and the corresponding values extracted from the vector.

Here each succesive element returned by the iterator is generated by a call to an anonymous function which extracts a single sample from our list of names. I am not quite sure how I would use this facility in a practical situation (why is this superior to simply calling the function?) but I am sure that a good application will reveal itself in good time.

Rolling Your Own: A Fibonacci Iterator

An iterator to generate Fibonacci Numbers is readily implemented in Python. The code below prints a never ending Fibonacci sequence because the iterator does not terminate.

You might be wondering what advantages this provides over simply having a function which generates the sequence. Consider a situation where you might need to have two independent sequences. Certainly this could be implemented with a function accepting a variable which stores the state of each sequence. An object oriented solution would be natural. However, iterators will do the job very nicely too.

itertools Package

The itertools package is written and maintained by Steve Weston (Revolution Analytics) and Hadley Wickham (RStudio). It provides a range of extensions to the basic iterator functionality.

> library(itertools)

Stop When You Reach the End

The Fibonacci iterator implemented above will keep on generating terms indefinitely. What if we wanted it to terminate after a finite number of terms? We could introduce a parameter for the required number of terms and generate a StopIteration error when we try to exceed this limit.

… but it generates a rather rude error message if you try to iterate beyond the prescribed number of terms. How can we handle this more elegantly? Enter the ihasNext() wrapper which enables a hasNext() function for an iterator.

So, instead of blindly trying to proceed to the next item, we first check whether there are still items available in the iterator. This pattern can be applied to any situation where the iterator has only a finite number of terms. Iterating through the lines in a file would be a prime example!

Limiting the Limitless

Adapting our Fibonacci iterator to generate only a finite number of terms was a bit of a kludge. Not to worry: the ilimit() wrapper allows us to stipulate a limit on the number of terms to be generated by any iterator.

> ifib unlist(as.list(ifib))
[1] 1 1 2 3 5 8 13 21 34 55 89 144

Recycle Everything

We have seen how to truncate an effectively infinite series of iterates. How about replicating a finite series? Using recycle() you can run through the iterable a number of times. As a first example, we create an iterator which will run through a integer sequence (1, 2 and 3) three times.

Next we generate the first six terms from the Fibonacci sequence and repeat them twice.

> irec unlist(as.list(irec))
[1] 1 1 2 3 5 8 1 1 2 3 5 8

Enumerating

The enumerate() function returns an iterator which returns each element of the iterable as a list with two elements: an index and the corresponding value. So, for example, we can number each of the elements in our list of names.

Getting Chunky!

What about returning groups of values from an iterator? The ichunk() wrapper takes two arguments: the first is an iterator, the second is the number of terms to be wrapped in a chunk. Here is an example of generating groups of three successive terms from the Fibonacci sequence.

Until I Say “Stop!”

Finally an iterator which continues to generate terms until a time limit is reached. How many terms will our Fibonacci iterator generate in 0.1 seconds?

> length(as.list(timeout(fibonacci(), 0.1)))
[1] 701

Conclusion

I have already identified a few older projects where the code can be significantly improved by the use of iterators. And I will certainly be using them in new projects. I hope that you too will be able to apply them in your work. Iterate and prosper!