Clone this wiki locally

CEP 307: Generators and PEP 342 Coroutines

Author: Stefan Behnel

Status: planning, undecided

Coroutines (as defined in PEP 342) are a major feature of the Python language, often considered a killer feature. Their support for suspending and resuming the execution at virtually any point inside of a function body enables various interaction patterns between function objects.

As of version 0.13, Cython does not support this feature. This CEP is designed to outline the path towards an efficient implementation.

Overview

The implementation is expected to require the following major steps:

finish up closure support

refactoring to merge the implementation of Python (def) functions and C (cdef) functions

support for writing utility code in Cython instead of C

implement a generic, builtin coroutine type in Cython code

support temporary variables in closures

implement generator functions based on the coroutine type and the yield keyword

Closure support

Closures enclose the state of a function, which usually involves local variables and potentially execution state. Closures are used to implement inner functions in Python which can refer to names in outer scopes.

Implementation: Closure support is mostly done in the cython-closures branch. Closures are realised as extension classes with fields for all contained names. The generator implementation will be based on the code in that branch.

Python function refactoring

The major difference between Python functions and C functions is the calling convention, including the argument passing as argument tuple and keywords.

Currently, Cython has two partly separated implementations for both types of functions that have grown their own differences and specialties over time. The idea is to merge the two and to reduce Python functions to a Python wrapper around a C function. This has the additional advantage that the current argument unpacking code, which is rather lengthy, will get moved out into the Python wrapper function, thus leaving the real function body inside of the more readable C function.

Implementation: refactor def functions into a Python wrapper and a static C function

the Python wrapper
* has a normal Python call signature (args/kwargs, METH_O, METH_NOARGS, etc.)
* does all argument unpacking
* calls the C function to execute the function body
* passes the return value or a raised exception on to its caller

the C function
* contains the complete body of the original function
* has a signature as written in the Cython code

The idea is to let DefNode generate its C function when analysing the declarations, move its body over to the new function, and replace it with a simple call to that function. The rest should be handled by the normal analysis phases. Note that the C function will not be declared in the symbol table.

In functions that have a closure, the Python wrapper gets called with the closure in the self argument, which has to get passed on to the C function.

Currently, cpdef functions do the opposite to this approach, in that they create a DefNode for an existing C function. The new scheme puts the DefNode first and declares the C function at declaration analysis time. The cpdef function call semantics can then be represented by adding a trivial C function wrapper around the C function that passes the override flag as a constant. This wrapper function should be declared inline, so that the C compiler can simply replace the calls to the original function with the call to the real C function. Note that the C function wrapper needs to be declared in the symbol table.

Comments: it can be argued that the additional call to the C function from within the Python wrapper might introduce a performance regression. Only benchmarking will allow a final decision here, but it can also be argued that the existing overhead for calling the Python function and unpacking/converting the arguments is high enough to render the additional C call overhead negligible. Also, for very short functions, where the call overhead really matters, the C compiler might end up inlining the entire C function into the Python function wrapper, so that the call overhead gets removed entirely.

Also note that this refactoring is not strictly required for the rest of this CEP. It would, however, clean up the current function implementation considerably and make it easier to fit the generator function transformations into a common scheme.

Writing utility code in Cython

This is required to simplify the implementation of the generator type. The idea is to compile regular Cython code inside of a separate scope (with its own name mangling) and merge it into the generated module.

This is a generally useful feature and it appears that this has already been implemented by Kurt Smith as part of the 2009 Google SoC. Similar functionality was needed in this context to implement the integration with external Fortran code.

Implementing a custom 'coroutine' type

Once it is possible to implement utility code in Cython code (see above), it becomes trivial to write a dedicated coroutine type that implements PEP 342. The idea is to put the generator function code into a C function and provide the coroutine type with a function pointer. Values that get sent into the coroutine will be passed on as arguments to the generator function.

Note that this type needs no __dealloc__ method and no further specialisation. All cleanup (including the cleanup of temp variables) should be done by the specialised closure class.

Note also that the exception raising happens by actually setting the exception outside of the generator function code, and then sending NULL into the generator. This relieves the generator function from having to know the exception values itself, and makes it easy for the generated yield code to run the normal exception handling code after a quick NULL test.

A possibility for later optimisation would be a dedicated coroutine type for each generator function. This would then call the generator function directly, instead of going through an indirect call to a type field. However, the expected gain will be small compared to the major overhead of generating the code for a complete type with various methods each time.

It may or may not be possible to make the coroutine type inherit from CPython's GeneratorType. This would enhance the interoperability, but it would also mean some unnecessary instance size overhead. Additionally, it is required to prevent the base-type from doing anything, including initialisation and final cleanup. This is the main reason why this will be hard to do.

Potentially, the closure class for a generator function could also be a subtype of the above coroutine type. That would remove the need for a dedicated closure reference.

Support temporary variables in closures

To support generators, as presented below, it is required to extend closure classes with fields that store temporary variables. As those are allocated at code generation time, the final closure type struct will not be known before the C code for the function body has been generated.

It might be possible to delay the code generation for the closure class until after the generation of the function body, so that the additional fields that store temporary variables can be normally declared on the closure type. This would enable the generation of normal XDECREF cleanup code and GC traversal code also for temporary variables that contain Python objects. Note that these references need to participate in garbage collection traversal, as a generator can potentially hold a reference to itself or participate in other kinds of reference cycles.

Comments: As an optimisation, the closure could store temp variables as a union of structs, one struct for each set of temps that was alive during generator suspension. This would reduce the size of the closure to the maximum number of temp variables that are alive during suspension, which are expected to be very few, compared to an arbitrary number of temps used inside of the complete function body. However, this would also require to store the additional information which set of temp variables is the currently active one, and it would require specialised cleanup code for each such set of temps. It is not currently clear how this can be made to work together with the garbage collector (see the PEP 342 section on the generator's __del__() method).

A simplified optimisation could just apply the above union-of-structs scheme to non-Python temps. Here, no additional state is needed as the knowledge about the active temp struct would be held by the YieldNode, i.e. it would be completely local to a single parse tree node. This would still require all Python temp variables to occupy space in the closure, which can be quite a few, e.g. when tuple packing is used anywhere in the generator code. The gain can therefore be expected to be much lower.

A better approach could split the temps into Python references and non-Python types. The union-of-structs scheme would only apply to the non-Python types, whereas the entire set of active Python references would be stored in a regular tuple, the unused elements of which will be None (or even NULL if that turns out to work well). From the point of view of the garbage collector, the closure would only contain one Python reference to that tuple, which would be easy to handle. Saving and recovering the values would simply fill the tuple from the beginning, stealing the references of the original temps.

Implementing generator functions

With the above preparation, implementing generator functions becomes easy. The main theme is to implement the yield keyword as a YieldNode and let sendval = (yield [expr]) emit the following code:

store away all current temp values in the closure

set closure._resume_label to the resume label number (int)

return the expression result (or None) - return immediately without cleanup (the temp that holds the expression result must be unmanaged to prevent DECREF()-ing on resume; INCREF()-ing the return value will keep it alive for too long)

here goes the resume label (__Lxyz_resume_from_yield:)

reset all saved temp values from the closure

if the sent value is NULL, an exception is to be raised (gen.throw() was called, which has already set the exception externally) => use the normal exception handling path

set the result temp of the yield node to the send value argument that was passed (INCREF or not, as for parameters)

The generator C function itself basically implements coroutine.send(x). It takes two parameters, the closure and the current send value. All original parameters must be moved into the closure. At the start of the function body, the following code is inserted:

if the label is 0, check x
* if None, execute the function body normally (next() was called on an uninitialised generator/coroutine)
* if NULL, take the normal exception handling path (.throw() was called, or the generator was terminated before even starting)
* otherwise, raise TypeError: can't send non-None value to a just-started generator

final termination label goes here

raise StopIteration

This header is followed by the original function body, without further modifications. When normally returning from the generator function or when raising an uncaught exception (i.e. when the generator terminates in any way other than a yield), the resume label must be set to the final termination label generated above, so that subsequent calls to the generator will continue to raise StopIteration. This case could also be handled by the coroutine class.

The definition of the public Python generator function is replaced by the following pseudo code:

Note that all parameters of the original generator function have been moved into the closure and are set up here. Thus, closures must provide the following additional content:

the set of function parameters

the set of temps

the resume label number (int)

Also note that it is a compile time syntax error if a generator function contains a return statement with an explicit value (even None is not allowed). CPython raises SyntaxError: 'return' with argument inside generator.