Thursday, January 26, 2012

Comparing Partial Evaluation and Tracing, Part 1

As part of writing my PhD I am currently thinking about the relationship
between PyPy's meta-tracing approach with various previous ideas to
automatically get a (JIT-)compiler from only an interpreter of a language. One
of the most-researched ideas along these lines is that of partial evaluation.
Partial evaluation has basically the same goals as PyPy when it comes to
compilers: Write an interpreter, and get a compiler for free. The methods for
reaching that goal are a bit different. In this series of blog posts, I am
trying to explore the similarities and differences of partial evaluation and
PyPy's meta-tracing.

A Flowgraph Language

To be able to clearly understand what "partial evaluation" is and what
"meta-tracing" is I will show an "executable model" of both. To that end, I am
defining a small imperative language and will then show what a partial evaluator
and a tracer for that language look like. All this code will be
implemented in Prolog. (Any pattern-matching functional language would do, but I
happen to know Prolog best. Backtracking is not used, so you can read things
simply as functional programs.) In this post I will start with
the definition of the language, and a partial evaluator for it. The code
written in this blog post can be found fully here: http://paste.pocoo.org/show/541004/

The language is conceptionally similar to PyPy's flow graphs, but a bit more
restricted. It does not have function calls, only labelled basic blocks
that consist of a series of linearly executed operations, followed by a
conditional or an unconditional jump. Every operation is assigning a value to a
variable, which is computed by applying some operation to some arguments.

A simple program to raise x to the yth power in that language looks like
this:

Every rule of block declares one block by first giving the label of the
block, followed by the code. Code is a series of op1 or op2 statements
terminated by a jump, an if or a print_and_stop. op1 statements
are operations with one argument of the form op1(res_variable,
operation_name, argument, next_statement). Arguments can be either variables
in the form var(name) or constants in the form const(value).

To run programs in this flowgraph language, we first need some helper
functionality. The first few helper functions are concerned with the handling of
environments, the data structures the interpreter uses to map variable
names occuring in the program to the variables' current values. In Python
dictionaries would be used for this purpose, but in Prolog we have to emulate
these by lists of key/value pairs (not very efficient, but good enough):

The implementation of these functions is not too important. The lookup
function finds a key in an environment list, the write_env function adds a
new key/value pair to an environment, remove_env removes a key. The
resolve function is used to take either a constant or a variable and return
a value. If it's a constant, the value of that constant is returned, if it's a
variable it is looked up in the environment. Note how the last argument of
lookup and resolve is actually a return value, which is the typical
approach in Prolog.

So far we have not specified what the primitive operations that can occur in the
program actually mean. For that we define a do_op function which
executes primitive operations:

Now we can start executing simple operations. For that an interp predicate
is defined. It takes as its first argument the current environment and as the
second argument the operation to execute. E.g. to execute primitive operations
with one or two arguments:

First the arguments are resolved into values. Afterwards the operation is executed,
and the result is written back into the environment. Then interp is called on
the rest of the program. Similarly easy are the unconditional jump and
print_and_stop:

Partial Evaluation of the Flowgraph Language

Let's look at what a partial evaluator for this simple flowgraph language would
look like. Partial evaluation (PE), also called specialization, is a program
manipuation technique. PE takes an input program and transforms it into a
(hopefully) simpler and faster output program. It does this by assuming that
some variables in the input program are constants. All operations that act only
on such constants can be folded away. All other operations need to remain in the
output program (called residual program). Thus the partial evaluator proceeds
much like an interpreter, just that it cannot actually execute some operations.
Also, its output is not just a value, but also list of remaining operations that
could not be optimized away.

The partial evaluator cannot use normal environments, because unlike the
interpreter not all variables' values are known to it. It will therefore work on
partial environments, which store just the know variables. For these partial
environments, some new helper functions are needed:

The function plookup takes a variable and a partial environment and returns
either const(Value) if the variable is found in the partial environment or
var(Key) if it is not. Equivalently, presolve is like resolve,
except that it uses plookup instead of lookup.

With these helpers we can start writing a partial evaluator. The following two
rules are where the main optimization in the form of constant folding happens.
The idea is that when the partial evaluator sees an operation that involves
only constant arguments, it can constant-fold the operation, otherwise it
can't:

The pe predicate takes a partial environment, the current operations and
potentially returns a new operation. To partially evaluate a simple operation, its arguments are
looked up in the partial environment. If all the arguments are constants, the
operation can be executed, and no new operation is produced. Otherwise, we need
to produce a new residual operation which is exactly like the one currently
looked at. Also, the result variable needs to be removed from the partial
environment, because it was just overwritten by an unknown value.

The potentially generated residual operation is stored into the output argument
NewOp. The output argument of the recursive call is the last argument of
the newly created residual operation, which will then be filled by the
recursive call. This is a typical approach in Prolog, but may look strange if
you are not familiar with it.

Note how the first case of these two rules is just like interpretation. The
second case doesn't really do anything, it just produces a residual operation.
This relationship between normal evaluation and partial evaluation is very
typical.

To partially evaluate an unconditional jump we again produce a jump. The target
label of that residual jump is computed by asking the partial evaluator to
produce residual code for the label L with the given partial environment.
print_and_stop is simply turned into a print_and_stop. We will see the
code for do_pe soon.

First we look up the value of the condition variable. If it is a constant, we
can produce better code, because we know statically that only one path is
reachable. Thus we produce code for that path, and then emit an unconditional
jump there. If the condition variable is not known at partial evaluation time,
we need to partially evaluate both paths and produce a conditional jump in the
residual code.

This rule is the one that causes the partial evaluator to potentially do much
more work than the interpreter, because after an if sometimes both paths
need to be explored. In the worst case this process never stops, so a real
partial evaluator would need to ensure somehow that it terminates. There are
many algorithms for doing that, but I will ignore this problem here.

Now we need to understand what the do_pe predicate is doing. Its most
important task is to make sure that we don't do the same work twice by
memoizing code that was already partially evaluated in the past. For that it
keeps a mapping of Label, Partial Environment to Label of the residual
code:

If the code cache indicates that label L was already partially evaluated
with partial environment PEnv, then the previous residual code label
LPrevious
is returned. Otherwise, a new label is generated with gensym, the code cache
is informed of that new label with assert, then the block is partially
evaluated and the residual code is added to the database.

For those who know partial evaluation terminology: This partial evaluator is a
polyvariant online partial evaluator. "Polyvariant" means that for every label,
several specialized version of the block can be generated. "Online" means that
no preprocessing is done before the partial evaluator runs.

Partial Evaluation Example

With this code we can look at the classical example of partial evaluation (it's
probably the "Hello World" of partial evaluation). We
can ask the partial evaluator to compute a power function, where the exponent
y is a fixed number, e.g. 5, and the base x is unknown:

The code_cache tells which residual labels correspond to which original
labels under which partial environments. Thus, power1 contains the code of
power under the assumption that y is 5. Looking at the block listing,
the label power1 corresponds to code that simply multiplies res by x
five times without using the variable x at all. The loop that was present
in the original program has been fully unrolled, the loop variable y has
disappeared. Hopefully this is faster than the original program.

Conclusion

In this blog post we saw an interpreter for a simple flow graph language in
Prolog, together with a partial evaluator for it. The partial evaluator
essentially duplicates every rule of the interpreter. If all the arguments of
the current operation are known, it acts like the interpreter, otherwise it
simply copies the operation into the residual code.

Partial evaluation can be used for a variety of applications, but the most
commonly cited one is that of applying it to an interpreter. To do that, the
program that the interpreter runs is assumed to be constant by the partial
evaluator. Thus a specialized version of the interpreter is produced that does
not use the input program at all. That residual code can be seen as a compiled
version of the input program.

As part of writing my PhD I am currently thinking about the relationship
between PyPy's meta-tracing approach with various previous ideas to
automatically get a (JIT-)compiler from only an interpreter of a language. One
of the most-researched ideas along these lines is that of partial evaluation.
Partial evaluation has basically the same goals as PyPy when it comes to
compilers: Write an interpreter, and get a compiler for free. The methods for
reaching that goal are a bit different. In this series of blog posts, I am
trying to explore the similarities and differences of partial evaluation and
PyPy's meta-tracing.

A Flowgraph Language

To be able to clearly understand what "partial evaluation" is and what
"meta-tracing" is I will show an "executable model" of both. To that end, I am
defining a small imperative language and will then show what a partial evaluator
and a tracer for that language look like. All this code will be
implemented in Prolog. (Any pattern-matching functional language would do, but I
happen to know Prolog best. Backtracking is not used, so you can read things
simply as functional programs.) In this post I will start with
the definition of the language, and a partial evaluator for it. The code
written in this blog post can be found fully here: http://paste.pocoo.org/show/541004/

The language is conceptionally similar to PyPy's flow graphs, but a bit more
restricted. It does not have function calls, only labelled basic blocks
that consist of a series of linearly executed operations, followed by a
conditional or an unconditional jump. Every operation is assigning a value to a
variable, which is computed by applying some operation to some arguments.

A simple program to raise x to the yth power in that language looks like
this:

Every rule of block declares one block by first giving the label of the
block, followed by the code. Code is a series of op1 or op2 statements
terminated by a jump, an if or a print_and_stop. op1 statements
are operations with one argument of the form op1(res_variable,
operation_name, argument, next_statement). Arguments can be either variables
in the form var(name) or constants in the form const(value).

To run programs in this flowgraph language, we first need some helper
functionality. The first few helper functions are concerned with the handling of
environments, the data structures the interpreter uses to map variable
names occuring in the program to the variables' current values. In Python
dictionaries would be used for this purpose, but in Prolog we have to emulate
these by lists of key/value pairs (not very efficient, but good enough):

The implementation of these functions is not too important. The lookup
function finds a key in an environment list, the write_env function adds a
new key/value pair to an environment, remove_env removes a key. The
resolve function is used to take either a constant or a variable and return
a value. If it's a constant, the value of that constant is returned, if it's a
variable it is looked up in the environment. Note how the last argument of
lookup and resolve is actually a return value, which is the typical
approach in Prolog.

So far we have not specified what the primitive operations that can occur in the
program actually mean. For that we define a do_op function which
executes primitive operations:

Now we can start executing simple operations. For that an interp predicate
is defined. It takes as its first argument the current environment and as the
second argument the operation to execute. E.g. to execute primitive operations
with one or two arguments:

First the arguments are resolved into values. Afterwards the operation is executed,
and the result is written back into the environment. Then interp is called on
the rest of the program. Similarly easy are the unconditional jump and
print_and_stop:

Partial Evaluation of the Flowgraph Language

Let's look at what a partial evaluator for this simple flowgraph language would
look like. Partial evaluation (PE), also called specialization, is a program
manipuation technique. PE takes an input program and transforms it into a
(hopefully) simpler and faster output program. It does this by assuming that
some variables in the input program are constants. All operations that act only
on such constants can be folded away. All other operations need to remain in the
output program (called residual program). Thus the partial evaluator proceeds
much like an interpreter, just that it cannot actually execute some operations.
Also, its output is not just a value, but also list of remaining operations that
could not be optimized away.

The partial evaluator cannot use normal environments, because unlike the
interpreter not all variables' values are known to it. It will therefore work on
partial environments, which store just the know variables. For these partial
environments, some new helper functions are needed:

The function plookup takes a variable and a partial environment and returns
either const(Value) if the variable is found in the partial environment or
var(Key) if it is not. Equivalently, presolve is like resolve,
except that it uses plookup instead of lookup.

With these helpers we can start writing a partial evaluator. The following two
rules are where the main optimization in the form of constant folding happens.
The idea is that when the partial evaluator sees an operation that involves
only constant arguments, it can constant-fold the operation, otherwise it
can't:

The pe predicate takes a partial environment, the current operations and
potentially returns a new operation. To partially evaluate a simple operation, its arguments are
looked up in the partial environment. If all the arguments are constants, the
operation can be executed, and no new operation is produced. Otherwise, we need
to produce a new residual operation which is exactly like the one currently
looked at. Also, the result variable needs to be removed from the partial
environment, because it was just overwritten by an unknown value.

The potentially generated residual operation is stored into the output argument
NewOp. The output argument of the recursive call is the last argument of
the newly created residual operation, which will then be filled by the
recursive call. This is a typical approach in Prolog, but may look strange if
you are not familiar with it.

Note how the first case of these two rules is just like interpretation. The
second case doesn't really do anything, it just produces a residual operation.
This relationship between normal evaluation and partial evaluation is very
typical.

To partially evaluate an unconditional jump we again produce a jump. The target
label of that residual jump is computed by asking the partial evaluator to
produce residual code for the label L with the given partial environment.
print_and_stop is simply turned into a print_and_stop. We will see the
code for do_pe soon.

First we look up the value of the condition variable. If it is a constant, we
can produce better code, because we know statically that only one path is
reachable. Thus we produce code for that path, and then emit an unconditional
jump there. If the condition variable is not known at partial evaluation time,
we need to partially evaluate both paths and produce a conditional jump in the
residual code.

This rule is the one that causes the partial evaluator to potentially do much
more work than the interpreter, because after an if sometimes both paths
need to be explored. In the worst case this process never stops, so a real
partial evaluator would need to ensure somehow that it terminates. There are
many algorithms for doing that, but I will ignore this problem here.

Now we need to understand what the do_pe predicate is doing. Its most
important task is to make sure that we don't do the same work twice by
memoizing code that was already partially evaluated in the past. For that it
keeps a mapping of Label, Partial Environment to Label of the residual
code:

If the code cache indicates that label L was already partially evaluated
with partial environment PEnv, then the previous residual code label
LPrevious
is returned. Otherwise, a new label is generated with gensym, the code cache
is informed of that new label with assert, then the block is partially
evaluated and the residual code is added to the database.

For those who know partial evaluation terminology: This partial evaluator is a
polyvariant online partial evaluator. "Polyvariant" means that for every label,
several specialized version of the block can be generated. "Online" means that
no preprocessing is done before the partial evaluator runs.

Partial Evaluation Example

With this code we can look at the classical example of partial evaluation (it's
probably the "Hello World" of partial evaluation). We
can ask the partial evaluator to compute a power function, where the exponent
y is a fixed number, e.g. 5, and the base x is unknown:

The code_cache tells which residual labels correspond to which original
labels under which partial environments. Thus, power1 contains the code of
power under the assumption that y is 5. Looking at the block listing,
the label power1 corresponds to code that simply multiplies res by x
five times without using the variable x at all. The loop that was present
in the original program has been fully unrolled, the loop variable y has
disappeared. Hopefully this is faster than the original program.

Conclusion

In this blog post we saw an interpreter for a simple flow graph language in
Prolog, together with a partial evaluator for it. The partial evaluator
essentially duplicates every rule of the interpreter. If all the arguments of
the current operation are known, it acts like the interpreter, otherwise it
simply copies the operation into the residual code.

Partial evaluation can be used for a variety of applications, but the most
commonly cited one is that of applying it to an interpreter. To do that, the
program that the interpreter runs is assumed to be constant by the partial
evaluator. Thus a specialized version of the interpreter is produced that does
not use the input program at all. That residual code can be seen as a compiled
version of the input program.

5 comments:

For those not too familiar with Prolog: assert(foo(..)) is not at all like the "assert" of Python or C code. Instead, it adds the rule 'foo(..)' in the database of rules. In other words, it is as if 'foo(..)' was added to the currently running program, as an extra rule.

@Anonymous: block(L, O) lists all the labels and operations corresponding to the labels that exist in the user program. See the very beginning of the post. Also, when partial evaluation creates new code it adds new cases to block(L, O), with the statement assert(block(..., ...)).