Thursday, September 4, 2008

Now it starts getting interesting.

I have a Scheme system that can interpret the MIT-Scheme S-Code, but
it doesn't use the underlying MIT-Scheme runtime system. It can read
the fasl binaries that MIT Scheme produces, but the object format is
whatever the .NET system is providing. How does it perform? Not
particularly well.

It's pretty easy to figure out why. Even the simplest piece of code
is creating and invoking continuations at an enormous rate. Each
continuation is being allocated on the heap. Now it is the case with
modern generational garbage collection that heap allocation has almost
no cost, but even the very tiny cost of allocation adds up quickly
when you do it for nearly every node you traverse in the abstract
syntax tree.

But the real problem is this: the CPU has specialized hardware
that is specifically designed to efficiently allocate, deallocate and
manipulate continuations, and we aren't using it. Again, it is
sitting there managing the continuations used by the virtual functions
in our interpreter, while we use the main data paths and the memory
management software library to mimic it's functionality for the
virtual machine. In fact, we go out of our way to get it to
discard these useless continuations so we can preserve tail
recursion.

Why do we do this? First, the C# continuations don't quite do what we
want. C#, like most other languages, unconditionally allocates
continuations on procedure entry and deallocates them on exit.
Scheme, on the other hand, allocates continuations only if they
logically needed to regain control. If a routine has completed its
work and is simply going to call another routine to get the answer, it
just jumps there and the callee transmits the answer directly back to
whatever routine initiated the call chain.

Second, C# continuations are too opaque for our purposes. Scheme's
call-with-current-continuation primitive reifies the continuation
— it takes the stack and turns it into a first-class object
that can be manipulated by the user. C#, like Java, does not permit
programs to randomly access the stack and certainly doesn't let
programs unload and reload the contents of the stack.

Note how the new continutation takes as one of its arguments the
existing continuation. We put that into the interpreter continuation
register, set the interpreter expression register to the predicate of
the conditional and then return control to the caller (which is the
trampoline).

The point is to evaluate the predicate and pass the answer on to the
newly created ConditionalDecide continuation. It has code like this:

This is not only easier to write, it performs much better because we
are co-opting the continuations that are used for C# subroutines for
our own sub-evaluations. Unfortunately, however, the tail calls to
EvalStep are allocating a continuation that destroys tail recursion.
We need to transfer control to one of these routines without
allocating a new continuation.

Here's the trick: we'll get the caller to do the transfer for us.
We'll return to the caller with some indication that we're not really
done, but that we intend for someone else to finish our job.
Essentially, we are taking some of the code that logically belongs in
our routine and placing it in the caller. This isn't possible in
general because the caller is unknown and the callees can be
arbitrary, but in the context of the interpreter, we have an
extraordinary special case: Nearly every call that is necessary to be
tail recursive is a call to EvalStep. (There are a couple of
primitives that call back into the interpreter that we'll deal with in
a bit.)

We'll define a new calling convention that requires the caller to
handle the transfer to EvalStep for us. It's a little unorthodox, but
no more so than requiring every routine to return to a trampoline.
The new signature of EvalStep will be this:

bool EvalStep (out object answer, ref SCode expression);

The boolean return value indicates whether we've returned control with
an answer or whether we've returned with a new expression we wish to
tail call. The caller must invoke EvalStep like this:

SCode expr = <expression to evaluate>;
object answer;
while (expr.EvalStep (out answer, ref expr)) { };
// When the while exits, we have an answer.

And the callee should do one of two things. Either assign an answer
and return false, or assign a new expression and return true. The
EvalStep handler for the conditional expression above would therefore
look like this:

This isn't quite the whole story. Evaluation needs an environment and
that is missing. The EvalStep routines will need to provide this to
the caller as well, so it will handled with a ref argument, too. Here
is the code for conditionals:

If you look at this code you'll notice that what we are doing is
severely abusing the language features to change the calling
conventions. We allocate some stack for EvalStep in the caller's
frame. We then pass control to EvalStep and give it the addresses of
the allocated variables. EvalStep eventually returns control to us,
but it isn't logically finished. Recall that we've moved some of
the code in the callee to the caller. If the return value is true,
the while loop performs a method call to EvalStep on the new value of
expr. This is the call that the callee would have performed, if he
was allowed to clear the stack prior to the call. Since he is not
allowed to do this, we just move the code around a little so that it
runs after the return.

With the code in this form, the interpreter continuations and the C#
continuations are one and the same object, and we can take advantage
of the processor's stack hardware for performance. This solves the
first problem we have with C# continuations. The second problem is
harder to solve.