This document describes the compilation process used by the CHICKEN Scheme to C compiler by explaining the different compilation stages on a simple example program.

CHICKEN uses a compilation strategy called Cheney-on-the-MTA after a paper by Henry Baker[1]. The basic idea is quite simple: compile Scheme to C by first transforming a program into Continuation Passing Style (CPS) and then directly generate C code in CPS as functions that never return and call a stack-allocated continuation record instead (that also holds a pointer to the code of a continuation procedure). Allocation is done simply by creating data structures on the stack - since functions never return, the allocated data will stay "live". As this would build up stack-frames endlessly, the stack-pointer is checked at regular times (currently on every function entry) whether a predetermined limit is reached and once the limit is exceeded, the current arguments and continuation is saved and all live data is copied into the heap (effectively a second heap generation in a generational garbage collection scheme). This copying traverses all data that can be reached from the current dynamic state of the program (which is just the continuation, the arguments passed to the current procedure plus the current closure). Unreachable data is never touched and thus the time required for copying is proportional to the amount of live data on the stack.

Allocation can be extremely fast in the scheme, as we basically can use the machine's stack-pointer as a dedicated allocation pointer register. Another advantage is that through the CPS conversion (combined with a flat closure representation) a minimum of garbage is retained: only data in free variables that are guaranteed to be used is stored in continuation closure records, at a sub-procedure level. As continuations are explicit and inherent under this strategy, code that uses continuations heavily pays no performance penalty. This is particularly important when threads are implemented on top of continuations.

A disadvantage is that a lot of allocation takes place and that the CPS representation puts certain contraints on interfacing to foreign code (especially when callbacks are involved).

Ok, let's start with an example: the well-known N-queens problem. We present the code as it is transformed by the various compilation stages:

You'll note some administrative forms at the start and the end of the program - they ensure termination will call some setup- and cleanup code. Toplevel-definitions are also replaced by assignment and lexical identifiers are renamed (alpha-converted).

The next step is converting to CPS - you see that this generates quite a lot of code - the code has been slightly re-formatted to fit into this page. Note that the compiler operates on an abstract syntax tree now, the s-expression notation is reconstructed by the -debug 3 option given to the compiler:

Next round. This time k141, k147 and dec-to1 have been contracted, which means inlining of procedures called only once (an optimization that guarantees the program will not grow). Some variables and bindings have been removed as they are unnecessary:

Here the trampolines are declared and defined: For every generated C function from the CPS representation, we need a trampoline function that has a fixed calling convention and can be passed to the garbage collector when the stack is exhausted (the allocation limit we mentioned above). The trampoline for a given function will call the original function with the restored arguments, continuation and closure record. The trampolines starting with trf_ are custom: the associated functions have been detected during optimization to be customizable and follow a slightly different calling convention and thus need a specific trampoline. The trampolines starting with tr_ are general ones and can be re-used for all functions with the matching number of arguments.

void C_ccall C_toplevel(C_word c,C_word t0,C_word t1){
C_word tmp;
C_word t2;
C_word t3;
C_word *a;
/* was this compilation unit already executed? then return to caller: */if(toplevel_initialized) C_kontinue(t1,C_SCHEME_UNDEFINED);
/* else note entry (this will output the start of executing the compilation
unit toplevel when debug mode is enabled with the "-:d" runtime option): */else C_toplevel_entry(C_text("toplevel"));
/* resize nursery (first heap generation) to value given to compiler (here
it is the default): */
C_resize_stack(131072);
/* check whether the nursery has generally least enough space for all literals
we create in this unit: */
C_check_nursery_minimum(3);
/* Is the current level (as opposed to the total capacity) of the nursery ok: */if(!C_demand(3)){/* no - save temporaries and invoke a minor (nursery) garbage collection,
passing the proper trampoline to re-enter the function: */
C_save(t1);
C_reclaim((void*)toplevel_trampoline,NULL);}/* otherwise mark as initialized: */
toplevel_initialized=1;
/* check whether the second-generation heap is big enough for all literals
defined here: */if(!C_demand_2(30)){/* no - invoke major GC with minimum space required: */
C_save(t1);
C_rereclaim2(30*sizeof(C_word), 1);
/* restore temporaries (pop from temporary stack): */
t1=C_restore;}/* allocate storage for the data we create in the nursery (on the
stack). This is only the closure, the rest is already created in the
second generation (in the heap), since it will live forever anyway: */
a=C_alloc(3);
/* initialize a literal frame record: */
C_initialize_lf(lf,8);
/* intern symbols that we are going to us as toplevel variables: */
lf[0]=C_h_intern(&lf[0],7,"nqueens");
lf[1]=C_h_intern(&lf[1],6,"append");
/* these "lambda-info" strings are used to show more meaningful output
when printing a procedure: */
lf[2]=C_static_lambda_info(C_heaptop,16,"(try x9 y10 z11)");
lf[3]=C_static_lambda_info(C_heaptop,27,"(ok\077 row12 dist13 placed14)");
lf[4]=C_static_lambda_info(C_heaptop,12,"(loop i6 l7)");
lf[5]=C_static_lambda_info(C_heaptop,12,"(nqueens n0)");
lf[6]=C_h_intern(&lf[6],25,"\003sysimplicit-exit-handler");
lf[7]=C_static_lambda_info(C_heaptop,10,"(toplevel)");
/* register literal frame globally to be traversed on every major GC-.
This also creates the "procedure table" for serialization (if enabled): */
C_register_lf2(lf,8,create_ptable());
/* allocate our first closure record - there are going to be more of those...
(note the store of temporary "t1", which is the continuation of the call
to this toplevel procedure). "f_29" is "k27" below: */
t2=(*a=C_CLOSURE_TYPE|2,a[1]=(C_word)f_29,a[2]=t1,tmp=(C_word)a,a+=3,tmp);
/* Invoke the "library" unit (done by default by compiled code, unless
the "-explicit-use" option is given): */
C_library_toplevel(2,C_SCHEME_UNDEFINED,t2);}

Next the code for user procedures and continuation closures introduced by the CPS conversion:

No, you are not quite through yet. Let's look at a simpler example, with full optimizations turned on and with safety checks disabled. The example program is the takl benchmark from the Gabriel benchmark suite[2] (translated to Scheme by Will Clinger):