Python Generators

What is a Generator?

A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.

The state of the function is maintained through the use of the keyword yield, which has the following syntax:

yield [expression_list]

This Python keyword works much like using return, but it has some important differences, which we'll explain throughout this article.

Generators were introduced in PEP 255, together with the yield statement. They have been available since Python version 2.2.

How do Python Generators Work?

In order to understand how generators work, let's use the simple example below:

The code above defines a generator named numberGenerator, which receives a value n as an argument, and then defines and uses it as the limit value in a while loop. In addition, it defines a variable named number and assigns the value zero to it.

Calling the "instantiated" generator (myGenerator) with the next() method runs the generator code until the first yield statement, which returns 1 in this case.

Even after returning a value to us, the function then keeps the value of the variable number for the next time the function is called and increases its value by one. So the next time this function is called, it will pick up right where it left off.

Calling the function two more times, provides us with the next 2 numbers in the sequence, as seen below:

$ python generator_example_1.py
0
1
2

If we were to have called this generator again, we would have received a StopIteration exception since it had completed and returned from its internal while loop.

This functionality is useful because we can use generators to dynamically create iterables on the fly. If we were to wrap myGenerator with list(), then we'd get back an array of numbers (like [0, 1, 2]) instead of a generator object, which is a bit easier to work with in some applications.

The Difference Between return and yield

The keyword return returns a value from a function, at which time the function then loses its local state. Thus, the next time we call that function, it starts over from its first statement.

On the other hand, yield maintains the state between function calls, and resumes from where it left off when we call the next() method again. So if yield is called in the generator, then the next time the same generator is called we'll pick right back up after the last yield statement.

Using return in a Generator

A generator can use a return statement, but only without a return value, that is in the form:

return

When the generator finds the return statement, it proceeds as in any other function return.

As the PEP 255 states:

Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions.

Let's modify our previous example by adding an if-else clause, which will discriminate against numbers higher than 20. The code is as follows:

In this example, since our generator won't yield any values it will be an empty array, as the number 30 is higher than 20. Thus, the return statement is working similarly to a break statement in this case.

This can be seen below:

$ python generator_example_2.py
[]

If we would have assigned a value less than 20, the results would have been similar to the first example.

Using next() to Iterate through a Generator

We can parse the values yielded by a generator using the next() method, as seen in the first example. This method tells the generator to only return the next value of the iterable, but nothing else.

For example, the following code will print on the screen the values 0 to 9.

The code above is similar to the previous ones, but calls each value yielded by the generator with the function next(). In order to do this, we must first instantiate a generator g, which is like a variable that holds our generator state.

When the function next() is called with the generator as its argument, the Python generator function is executed until it finds a yield statement. Then, the yielded value is returned to the caller and the state of the generator is saved for later use.

Running the code above will produce the following output:

$ python generator_example_3.py
0
1
2
3
4
5
6
7
8
9

Note: There is, however, a syntax difference between Python 2 and 3. The code above uses the Python 3 version. In Python 2, the next() can use the previous syntax or the following syntax:

print(g.next())

What is a Generator Expression?

Generator expressions are like list comprehensions, but they return a generator instead of a list. They were proposed in PEP 289, and became part of Python since version 2.4.

The syntax is similar to list comprehensions, but instead of square brackets, they use parenthesis.

For example, our code from before could be modified using generator expressions as follows:

# generator_example_4.py
g = (x for x in range(10))
print(list(g))

The results will be the same as in our first few examples:

$ python generator_example_4.py
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Generator expressions are useful when using reduction functions such as sum(), min(), or max(), as they reduce the code to a single line. They're also much shorter to type than a full Python generator function. For example, the following code will sum the first 10 numbers:

# generator_example_5.py
g = (x for x in range(10))
print(sum(g))

After running this code, the result will be:

$ python generator_example_5.py
45

Managing Exceptions

One important thing to note is that the yield keyword is not permitted in the try part of a try/finally construct. Thus, generators should allocate resources with caution.

However, yieldcan appear in finally clauses, except clauses, or in the try part of try/except clauses.

In the code above, as a result of the finally clause, the number 10 is included in the output, and the result is a list of numbers from 0 to 10. This normally wouldn't happen since the conditional statement is number < n. This can be seen in the output below:

$ python generator_example_6.py
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Sending Values to Generators

Generators have a powerful tool in the send() method for generator-iterators. This method was defined in PEP 342, and is available since Python version 2.5.

The send() method resumes the generator and sends a value that will be used to continue with the next yield. The method returns the new value yielded by the generator.

The syntax is send() or send(value). Without any value, the send method is equivalent to a next() call. This method can also use None as a value. In both cases, the result will be that the generator advances its execution to the first yield expression.

If the generator exits without yielding a new value (like by using return), the send() method raises StopIteration.

The following example illustrates the use of send(). In the first and third lines of our generator, we ask the program to assign the variable number the value previously yielded. In the first line after our generator function, we instantiate the generator, and we generate a first yield in the next line by calling the next function. Thus, in the last line we send the value 5, which will be used as input by the generator, and considered as its previous yield.

Note: Because there is no yielded value when the generator is first created, before using send(), we must make sure that the generator yielded a value using next() or send(None). In the example above, we execute the next(g) line for just this reason, otherwise we'd get an error saying "TypeError: can't send non-None value to a just-started generator".

After running the program, it prints on the screen the value 5, which is what we sent to it:

$ python generator_example_7.py
5

The third line of our generator from above also shows a new Python feature introduced in the same PEP: yield expressions. This feature allows the yield clause to be used on the right side of an assignment statement. The value of a yield expression is None, until the program calls the method send(value).

Connecting Generators

Since Python 3.3, a new feature allows generators to connect themselves and delegate to a sub-generator.

The new expression is defined in PEP 380, and its syntax is:

yield from <expression>

where <expression> is an expression evaluating to an iterable, which defines the delegating generator.

The code above defines three different generators. The first, named myGenerator1, has an input parameter, which is used to specify the limit in a range. The second, named myGenerator2, is similar to the previous one, but contains two input parameters, which specify the two limits allowed in the range of numbers. After this, myGenerator3 calls myGenerator1 and myGenerator2 to yield their values.

The last three lines of code print on the screen three lists generated from each of the three generators previously defined. As we can see when we run the program below, the result is that myGenerator3 uses the yields obtained from myGenerator1 and myGenerator2, in order to generate a list that combines the previous three lists.

The example also shows an important application of generators: the capacity to divide a long task into several separate parts, which can be useful when working with big sets of data.

As you can see, thanks to the yield from syntax, generators can be chained together for more dynamic programming.

Benefits of Generators

Simplified code

As seen in the examples shown in this article, generators simplify code in a very elegant manner. These code simplification and elegance are even more evident in generator expressions, where a single line of code replaces an entire block of code.

Better performance

Generators work on lazy (on-demand) generation of values. This results in two advantages. First, lower memory consumption. However, this memory saving will work in our benefit if we use the generator only once. If we use the values several times, it may be worthwhile to generate them at once and keep them for later use.

The on-demand nature of generators also means we may not have to generate values that won't be used, and thus would have been wasted cycles if they were generated. This means your program can use only the values needed without having to wait until all of them have been generated.

When to use Generators

Generators are an advanced tool present in Python. There are several programming cases where generators can increase efficiency. Some of these cases are:

Processing large amounts of data: generators provide calculation on-demand, also called lazy evaluation. This technique is used in stream processing.

Piping: stacked generators can be used as pipes, in a manner similar to Unix pipes.

Concurrency: generators can be used to generate (simulate) concurrency.

Wrapping Up

Generators are a type of function that generate a sequence of values. As such they can act in a similar manner to iterators. Their use results in a more elegant code and improved performance.

These aspects are even more evident in generator expressions, where one line of code can summarize a sequence of statements.

Generators' working capacity has been improved with new methods, such as send(), and enhanced statements, such as yield from.

As a result of these properties, generators have many useful applications, such as generating pipes, concurrent programming, and helping in creating streams from large amounts of data.

As a consequence of these improvements, Python is becoming more and more the language of choice in data science.