Building Expressions from Python Syntax Trees

The ast_builder module allows you to quickly navigate a Python syntax
tree and perform operations on it. While Python 2.5 has a new "AST" feature
that provides a high-level syntax tree, older Python versions offer a very
low-level interface that provides complex tuple trees with lots of redundant
information. The ast_builder module simplifies these trees dramatically,
without creating an intermediate AST data structure (the way the stdlib
compiler package does). Instead, it allows you to effectively "visit"
a virtual AST structure and generate your desired output directly. In
addition, it allows you to skip, delay, or repeat traversals of arbitrary
subtrees.

This document describes the design (and tests the implementation) of the
ast_builder module. You don't need to read it unless you want to use
this module directly in your own programs. If you do want to use it directly,
you should keep in mind that it currently only implements a subset of Python
expression syntax: it does not support lambdas, yield expressions, or any
kind of statements.

ast_builder operates on parse tuple trees, as created by the standard
library parser module. The two API functions it provides are build
and parse_expr:

>>> from peak.rules.ast_builder import build, parse_expr

The build() function accepts two arguments, a "builder" and a "nodelist".
A "builder" is an object that you supply that will perform actions on nodes in
the parse tree. The "nodelist" is a parse tuple tree. As a shortcut, you
can use parse_expr() to parse a string into a nodelist and invoke
build() in one step.

Most builder methods accept nodelists as arguments. These nodelists can be
recursively passed to build() in order to process expression subtrees.
This is not done automatically, because it's possible you might want to skip
processing of a particular subtree, or need to process a subtree with a
different builder than the one currently in use, or even process a subtree with
more than one builder (e.g. a builder that sees what names are bound within
a function body, and a second builder to generate code).

For convenience in the rest of this document, we'll use a shorthand function to
create a Builder(), parse an expression, and print the result:

The Compare method receives two arguments: a node for the first expression
to be compared, followed by a list of (op,expr) tuples for subsequent
comparisons. The op value is a string representing the comparison operator
used, and each expr is a node:

The ListComp and GenExpr methods receive two arguments: a node for the
output expression, and a list of (op,node) tuples, where op is the name
of an operator (either "for", "in", or "if"), and node is the node
corresponding to the operator's argument:

Note, by the way, that when you are building the "for" clause assignments,
you'll need to handle arbitrary assignments (e.g. tuple unpacking):

>>> pe("[x for y, x in z]")
ListComp(x for Tuple(y,x) in z)
>>> pe("[x.y for x.y in z]")
ListComp(Getattr(x,'y') for Getattr(x,'y') in z)
>>> pe("[x[y] for x[y] in z]")
ListComp(Subscript(x,y) for Subscript(x,y) in z)

(Normally, you would handle this by passing the "for" clauses to a different
builder instance that's set up to handle calls to Name, Getattr,
Tuple, etc. by generating assignments instead of lookups.)

>>> if sys.version>='2.4':
... pe("a(x for x in y if z)")
... pe("a(x for x in y if z, q)")
... else:
... print "Call(a,Tuple(GenExpr(x for x in y if z)),{},None,None)"
... print "Call(a,Tuple(GenExpr(x for x in y if z),q),{},None,None)"
Call(a,Tuple(GenExpr(x for x in y if z)),{},None,None)
Call(a,Tuple(GenExpr(x for x in y if z),q),{},None,None)