Code Generation from Python Syntax Trees

The peak.rules.codegen module extends peak.util.assembler (from the
"BytecodeAssembler" project) with additional AST node types to allow generation
of code for simple Python expressions (i.e., those without lambdas,
comprehensions, generators, or yields). It also provides "builder"
classes that work with the peak.rules.ast_builder module to generate
expression ASTs from Python source code, thus creating an end-to-end compiler
tool chain, common subexpression caching support, and a state-machine
interpreter generator.

This document describes the design (and tests the implementation) of the
codegen module. You don't need to read it unless you want to use
this module directly in your own programs, or to create specialized add-ons
to PEAK-Rules. If you do want to use it directly, keep in mind that it
inherits the limitations and restrictions of both peak.util.assembler and
peak.rules.ast_builder, so you should consult the documentation for those
tools before proceeding.

ExprBuilder instances are created using one or more namespaces. The first
namespace maps names to arbitrary AST nodes that will be substituted for any
matching names found in an expression. The second and remaining namespaces
will have their values wrapped in Const nodes, so they can be used for
constant-folding. For our examples, we'll define a base namespace containing
arguments named "a" through "g":

AST's generated using ExprBuilder can be used directly with
BytecodeAssembler Code objects to generate bytecode, complete with
constant-folding. Note that the node types not demonstrated below (e.g.
And, Or, Compare, Call) are not defined by the codegen
module, but instead are imported from peak.util.assembler:

PEAK-Rules often processes fairly large dispatch trees that would take a long
time to generate if translated entirely to bytecode. Plus, they would need to
be regenerated every time rules were added to a dispatch tree.

So, instead of generating bytecode that encodes the entire dispatch tree,
PEAK-Rules uses a "state machine interpreter" approach. The dispatch tree
is represented as a tree of objects. Each node consists of an "action" and
an "argument". The generated code is simply an interpreter with inlined
bytecode to implement the actions associated with the nodes. To minimize
interpretation overhead, actions are encoded in the dispatch tree as jump
offsets into the generated bytecode.

Interpreter functions are generated using the SMIGenerator class,
instantiated with a function whose calling signature will serve as a template
for the interpreter function:

To generate the interpreter function, you call the generate() method with
a root node: an action/argument tuple:

>>> exit_node = (0, interpreter)
>>> gfunc = smig.generate(exit_node)

The action must either be zero, or a value returned by the action_id()
method (described later below). When the generated interpreter encounters
action zero, it will treat the argument as a callback. The callback must
accept the same number and type of arguments as the interpreter function, and
it will be called with the values of the corresponding local variables. The
interpreter will invoke the callback, and then exit, returning whatever value
or exception was provided by the exit callback:

>>> gfunc(23)
23

Now let's use the same generator, but add some more actions to it. Actions are
added using the action_id() method, which takes a code generation target
and returns an action ID for use in the interpreter.

The code generation target will execute with no values on the stack, and must
finish execution with one value on the stack -- another (action, argument)
pair. It can use the generator's ARG attribute to refer to the action
argument, and the generator's NEXT_STATE attribute to
jump back to the action dispatch loop. A NEXT_STATE jump is automatically
generated after each action, so you don't need to include it.

For demonstration and testing, we'll create two new actions: an action that
sets the input local variable to its argument, and an action that simply
treats the argument as the next state -- a sort of "pass" action. We'll start
with the "pass" action:

>>> pass_id = smig.action_id(smig.ARG)

This is about the simplest possible action that meets the requirements of an
action: it takes no values on the stack, and puts one value on the stack. In
this case, the argument part of the current state.

Whenever you add new actions, you must regenerate the interpreter function
in order to be able to use them in the dispatch tree. So we'll regenerate
our input function, this time using the set_input action:

The peak.rules.codegen module includes a common-subexpression caching
extension of peak.util.assembler, used to implement "at most once"
calculation of any intermediate results during rule evaluation. It works
by setting aside a local variable ($CSECache) to hold a dictionary of
temporary values, keyed by strings.

Any time a cached value is needed, the dictionary is checked first. However,
the local variable is initially set to None, to avoid creating a dictionary
unnecessarily. In this way, only those portions of the dispatch tree that
require intermediate expression evaluation will incur the cost of creating or
accessing the dictionary.

Note that this caching mechanism is not primarily aimed at improving the
performance of the underlying code, although in some cases it might have this
effect. It is also not aimed at producing compact code; the code it generates
may be considerably larger than the unadorned code would be!

Rather, the goal is to provide the desired semantics (i.e. no duplicated
calculations) with better performance than the RuleDispatch package
provides for the same operations. In RuleDispatch, expressions are
calculated using partial functions and a similar cache dictionary to this one,
whereas here the functions are effectively inlined as Python bytecode.

Generating a cached object results in extra code being added to ensure that
the cache variable is initialized and to retrieve the cached value, if present.
The resulting code looks complex, but each of the possible code paths are
actually fairly short. The cache keys are the string forms of the cached
expressions, with an added number to ensure uniqueness:

While the cache() method marks an expression as definitely cacheable, the
maybe_cache() method allows the code object to decide for itself whether
the expression should be cached. Specifically, the given expression and all
its subexpressions are evaluated against a dummy code object, and its tree
structure is examined. Any non-leaf node that appears as a child of two
or more parents, or twice or more as a child of the same parent, is considered
suitable for caching.

In our first example, the expression (a+b)/c*d is cached, because it's
passed to maybe_cache() twice -- once by itself, and once as a child of
((a+b)/c*d)%3:

And in this example, we also compute (a+b)*(a+b), but this time only
inspecting that one expression for recurrences. We still find the recurrence,
because (a+b) occurs more than once under the parent expression: