Decoding The Racket Implementation

I’ve always wanted to know how big programs work, and to me the Racket implementation appears big. It’s pretty old and feels clunky. Nevertheless, let’s dive in to see if we can make sense of the code.

1
Setup

I start by pulling the code from the git repository hosted at https://github.com/racket/racket, I cd to my projects directory and run the following:

gitclonehttps://github.com/racket/racketracket

Great! Now I enter the directory and see three directories:

build

pkgs

racket

There’s a README.md and INSTALL.txt as well, but a quick search does not reveal where the entry point is nor what language the project is in.

After looking around for a bit I think racket contains the code. So I enter it.

bincollectsdocetcincludelibmansharesrc

Oh! That looks familiar, almost like the file system on modern linux systems. The obvious place for source is src/:

Oh no... seems like there are many entry points, but I want to find the interpreter entry point...
Eventually we find something that looks just too nice to not be the interpreter entry point

...

racket/main.c:int MAIN(int argc, MAIN_char **MAIN_argv)

...

Alright, let’s see what’s up.

vimracket/main.c

The header explains the following

This file defines Racket's main(), which is a jumble of

platform-specific initialization. The included file "cmdline.inc"

implements command-line parsing. (GRacket also uses "cmdline.inc".

This is actually very useful to know so we know where command line parsing happens.

The rest of the source code resides in the `src' subdirectory$

(except for the garbage collector, which is in `gc', `sgc', or$

`gc2', depending on which one you're using).

Great!

The relevant code comes down to:

int MAIN(int argc, MAIN_char **MAIN_argv)

{

#if defined(DOS_FILE_SYSTEM) && !defined(__MINGW32__)

load_delayed();

#endif

return main_after_dlls(argc, MAIN_argv);

}

Now when compiling this on my machine (Fedora 26), the ifdef won’t trigger, so I’m interested if we can get some result by adding a print right inside this main function.

int MAIN(int argc, MAIN_char **MAIN_argv)

{

#if defined(DOS_FILE_SYSTEM) && !defined(__MINGW32__)

load_delayed();

#endif

printf("Hello from main :]");

return main_after_dlls(argc, MAIN_argv);

}

To compile, I go back to the README file I saw earlier and note that I can configure and make:

For example, if you want to install into "/usr/local/racket" using

dynamic libraries, then run:

[here]configure --prefix=/usr/local/racket --enable-shared

Excellent, I’d like to put the results into a build directory, the README informs me that I can use a build directory and call configure as ../configure. Note that –enable-shared uses shared libraries.

mkdir build

cd build

../configure --enable-shared

Ok, no problems.

Configure with --prefix if you wanted to install somewhere else.

The --prefix option also makes the installed files better conform

to Unix installation conventions. (The configure script will show

you specific installation paths when --prefix is used.)

Thank you, having this information available at many stages is a good thing (tho, not too much of it).

Let’s see what build/ contains:

config.logconfig.statusforeign/gracket/lt/Makefileracket/rktio/

Okay, there is the familiar makefile, now we can run make and see what happens!

So maybe it’s libtool’s doing? I’m not sure. Let’s try using PRINTF instead of printf, nope doesn’t work. Let’s just put in the expression ‘"Hello";‘, does that work? YES! IT DOES. With a warning that the statement has no effect.

So my hypothesis is that libtool is actually scanning for printf and doing something strange with it.

I create a simple function to see if this at all works:

void x(const char *p) { }

...

x("Hello");

Okay, this actually compiles, so it just strengthens my idea about printf. Could it be... that the print is taken as a make command? That libtool runs the code and uses it to compile further parts of racket? That’s absurd. By removing the call to ‘x‘ but leaving ‘void x...‘ in, we can check if it’s run-time or compile-time dependent, seems run-time, as this works just fine.

To add another test, let’s create an infinite loop inside main ‘for (;;);‘

Unsurprisingly, it doesn’t stop. So we now know what happens. The compilation process uses the racket interpreter to compile stuff. That surely makes debugging harder for us. What can we do?

I try ‘git reset –hard‘ to undo every change I’ve made to check if that’s the cause. Nope, same error.

Asking the racket IRC channel on freenode.net (#racket) to see what’s up.

1.2
Finding the real entry point

Meanwhile, let’s see if the print to standard error is closest to the real entry point.
We can do this by running ‘objdump -d racket/racketcgc | less‘, unfortunately: file format not recognized. ‘file racket/racketcgc‘ gives

This really should be in its own function. My overall sense of ‘cmdline.inc‘ is that it’s a mess that is poorly abstracted into smaller functions (something that can be done).

At the end of the function, the results are moved from local variables into a structure called FinishedArgs (fa).

fa->a->use_repl = use_repl;

This seems like a code flaw to me. Since this is located in the same function in which the local variable is stored, why not either separate the function or just immediately assign to fa?

Anyway, let’s continue.

At the end of the function we see ‘return cont_run(fa)‘. Alright, so where is ‘cont_run‘? Well that comes as an argument into ‘run_from_cmd_line‘ from ‘main_after_stack‘. Looking at main.c again, we see ‘cont_run‘ declared and it calls

racket/main.c: cont_run(fa)

racket/cmdline.inc: finish_cmd_line_run(fa, do_scheme_rep)

Again, ‘finish_cmd_line_run‘ is located in cmdline.inc, what does it do? For starters it’s about 250 lines long. Okay so it’s managable...

So by reading, it simply issues scheme commands to a fresh environment, basically it prepares the environment for the program, setting a base namespace and so on, then calls ‘do_scheme_rep‘

Inside ‘do_scheme_rep‘ we require ‘racket/base‘ and the ‘read-eval-print-loop‘ symbol, then we do a ‘scheme_apply‘ on the resulting object. The code path looks like this:

Now using ‘git grep‘ doesn’t yield much information, so I’m going to run gdb on racket to and break on ‘scheme_apply‘:

Well shit, we get a SIGSEGV in ‘scheme_gmp_tls_unload‘ for some reason. I bet it has to do with threads.

Anyway, I asked around on IRC and just needed ‘handle SIGSEGV nostop noprint‘ in GDB, the reason is that the GC causes SIGSEGVs during normal operation and handles them. In our case gdb was interfering with handling.

<kefin> Anyone able to run gdb on racket? Getting a SIGSEGV in scheme_gmp_tls_unload

This last procedure is a bit hard to understand. It sets up continuation frames/barriers and then calls ‘apply_k‘, however, it may also call ‘apply_again_k‘. This function appears to check the arity and/or return values.

I put fprintf inside ‘apply_again_k‘ and recompile. Let’s see what happens... Nothing. It’s never printed. The comment on the code states

an abort to the thread start; act like the default prompt handler but remember to jump again

Ok...? What does the thread start mean?

We don’t know (yet) so let’s look further, finally ‘apply_k‘ is called

racket/src/fun.c: apply_k();

Depending on the current thread ku.k.p2, calls either

_scheme_apply_multi_wp or _scheme_apply_wp

To find out what the number means, I put a print, recompiled, and ran some statements. I’m not quite sure, when typing 1 in the REPL and running, we get a sequence of 1000101110001. Not sure how to decode this.

Oh this is quite nasty., the include in ‘scheme_do_eval‘ ends in a condition for the return.

Alright, let’s re-evaluate then.

racket/src/eval.c: scheme_do_eval(obj, num_rands, rands, get_value)

handles each type of obj (apply top)

The first type check is to see if a type is ‘scheme_primitive‘. Alright, I add fprintf to the if to see what happens.
When writing ‘1‘ in the REPL, primitive type shows up twice. Inside the if, a cast is made

prim = (Scheme_Primitive_Proc *)obj;

The type ‘Scheme_Primitive_Proc‘ looks like this (found using git grep)

typedef struct {

Scheme_Prim_Proc_Header pp;

Scheme_Primitive_Closure_Proc *prim_val;

const char *name;

mzshort mina;

/* If mina < 0; mina is negated case count minus one for a case-lambda

generated by mzc, where the primitive checks argument arity

itself, and mu.cases is available instead of mu.maxa. */

union {

mzshort *cases;

mzshort maxa;/* > SCHEME_MAX_ARGS => any number of arguments */

} mu;

} Scheme_Primitive_Proc;

Okay great! What do we do here? Oh look, it has ‘name‘, let’s print it!

check_location_fields

void

Okay interesting,... what is the location field then?

Instead of looking for that, I looked for ‘scheme_print‘, and found ‘scheme_print_to_string‘, maybe this can enlighten us! There is also one where you can print to a port but that is a ‘Scheme_Object‘, and I don’t know how to specify stderr as a scheme object.

This resulted in a compilation error for some reason, with the reason being bad startup script, errorring with a SIGSEGV. Hmmm. This indiactes perhaps that the string is null and that fprintf should not print it. Let’s try it.

Now it does not appear as if it gets back to the main ‘scheme_do_eval‘ function, so I’m not entirely sure what goes wrong here. Unfortunately, gdb can’t be used either since the program doesn’t compile fully. This is in my opinion a problem for newbies like myself. We can’t fully mess with the code because messing with it makes it uncompilable.

I remove static from the ‘print_to_string‘ function that ‘print_to_string_k‘ eventually calls and forward-declare it in ‘eval.c‘. Let’s see what happens next... but I can’t. There’s a variable coming from the environment called ‘qq_depth‘ and I have no idea what this means. Instead, I’ll write to stderr BEFORE calling the string conversion to see if anything indicates an infinite loop.

Interestingly, it’s only printed once. Is it because we can’t just reinterpret from within the main interpreter? That would certainly make sense, as it would mess up the state of the machine.

I’ll have to get back to the function ‘print_to_string‘ instead... wait. I might’ve given the wrong input. I see it takes a ‘intptr_t *len‘, and I gave it 100. Obviously that’s going to be a problem, so I declare a local variable and grant the address instead.

prim = (Scheme_Primitive_Proc *)obj;

intptr_t len;

GC_CAN_IGNORE char *x = scheme_print_to_string(obj, &len);

if (x) {

x[99] = '\0';

fprintf(stderr, "primitive type %s\n", x);

} else {

fprintf(stderr, "primitive unknown\n");

}

free(x);

The first line is not my code, but is there to show where the rest of the code is located. This results in a double-free, so I remove my free and ‘GC_CAN_IGNORE‘.

This time I get an xform error. xform is the program that checks variables in the C code. I’ll need to move len and x to the beginning of the block.

This also fails, let’s see what’s going on:

SIGSEGV MAPERR si_code 1 fault on addr 0x7f04c6363ff8

Okay, interesting. Why is that? I add in a free and get a ‘munmap_chunk‘ error, so it’s definitely not free that’s needed.
here is the revised code

#include <assert.h>

...

intptr_t len;

char *x;

...

prim = (Scheme_Primitive_Proc *)obj;

x = scheme_print_to_string(obj, &len);

if (x) {

assert(x[len-1] == '\0');

fprintf(stderr, "primitive type %s\n", x);

} else {

fprintf(stderr, "primitive unknown\n");

}

It might have been the x[99] that caused the sudden SIGSEGV. Let’s try and run this.

Assertion `x[len-1] == '\0'' failed

What about len instead of len-1? Maybe that’s the way things are here. The build just fails spontaneously. Perhaps we’re writing to parts of memory that aren’t supposed to be zerod. Hmm... the prints work just fine without the null. Perhaps it’s something else. I remove the assert.h include because it’s causing some funky business with other asserts.

Now we get SIGSEGV MAPERR again. I’m setting len-1 to be ’\0’ JUST to be safe here, maybe that cuts off a letter... and it does :O.

How very interesting. But alas, again the MAPERR. Again the idea of a stack overflow crosses my mind and I add a print right before ‘scheme_print_to_string‘ to see if that’s true.

There doesn’t seem to be any indication of an infinite loop here at all. There are all sorts of primitives coming through. If it were a loop we would see a pattern of the same primitives, besides adding a print after the statement confirms that we go in-and-out instead of looping.

Seg fault (internal error during gc) at 0x7fec405b09b8

SIGSEGV SEGV_ACCERR SI_CODE 2 fault on 0x7fec405b09b8

I’ll put a ‘GC_CAN_IGNORE‘ on len to see if that helps.
...
Same problem. Let’s add one on the ‘char* x‘ whilst having top open to make sure memory isn’t exhausted.

The same thing happened. I’m not sure where to progress to now. I know! Instead of checking for x = nullptr, let’s instead check for len != 0.

Still the same error. It’s so weird, it errs at about the same place every time. Let’s try and not print the specific object on which it fails. Let’s put prints around the ‘scheme_print_to_string‘ procedure, and, interestingly enough, the mapper error ONLY shows up AFTER this function has exited. What?! Let’s instead assign x = ""; to see if there is any difference at all. This in fact compiles just fine...

Using a static int to only run the print ONCE doesn’t make anything crash.

BUT it still compiles sufficiently to run the binary. Hurra! This unfortunately doesn’t shed much light to the subject; we just get a bunch of ‘#<procedure>‘ being printed.

1.6
What does the data look like?

While we can’t advance much on that problem, let’s look at what the data structure ‘Scheme_Object‘ looks like. To understand any code, one must understand the most important data structures.

# define MZ_HASH_KEY_EXshort keyex;

...keyex

typedef struct Scheme_Object

{

Scheme_Type type; /* Anything that starts with a type field

can be a Scheme_Object */

/* For precise GC, the keyex field is used for all object types to

store a hash key extension. The low bit is not used for this

purpose, though. For string, pair, vector, and box values in all

variants of Racket, the low bit is set to 1 to indicate that

the object is immutable. Thus, the keyex field is needed even in

non-precise GC mode, so such structures embed

Scheme_Inclhash_Object */

MZ_HASH_KEY_EX

} Scheme_Object;

Alright, so it’s basically just a placeholder pointer for other types. Great. So we have no type safety here at all. All that’s the same for every object is that the first few bytes are allocated to its type.

typedef short Scheme_Type;

Okay, so the first 2 bytes (on the machine I’m using) define the type of the ‘Scheme_Object‘.

typedef struct Scheme_Inclhash_Object

{

Scheme_Object so;

MZ_OPT_HASH_KEY_EX

} Scheme_Inclhash_Object;

typedef struct Scheme_Simple_Object

{

Scheme_Inclhash_Object iso;

union

{

struct { mzchar *string_val; intptr_t tag_val; } char_str_val;

struct { char *string_val; intptr_t tag_val; } byte_str_val;

struct { void *ptr1, *ptr2; } two_ptr_val;

struct { int int1; int int2; } two_int_val;

struct { void *ptr; int pint; } ptr_int_val;

struct { void *ptr; intptr_t pint; } ptr_long_val;

struct { struct Scheme_Object *car, *cdr; } pair_val;

struct { mzshort len; mzshort *vec; } svector_val;

struct { void *val; Scheme_Object *type; } cptr_val;

} u;

} Scheme_Simple_Object;

Okay, so this is pretty standard stuff, we have a string type, a byte-string type, two pointers, two integers (probably rationals), and so on.

Wow! There’s our symbol. How does ‘mzFLEX_ARRAY4_DECL‘ work? Well it can be either 4 or nothing, and if it’s nothing, then it’s a flexible array. I suppose the string is simply zero terminated and extends out of the struct itself.

1.7
apply top

By adding an fprintf under the ‘apply_top‘ label we see that every character that gets inserted results in an object of ‘#<procedure:readline/rktrl.rkt:188:14>‘ to be issued. This happens to be the following lambda:

(lambda (_)

(define next-byte (read-byte real-input-port))

(if (eof-object? next-byte) -1 next-byte)))

This is interesting, how does a keypress issue this command?

We see inside of ‘pkgs/readline-lib/readline/pread.rkt‘ the history, prompt, and so on

(define current-prompt(make-parameter #"> "))

(define max-history(make-parameter 100))

(define keep-duplicates(make-parameter #f))

I change the prompt to see what’s up,... and it works! So this package is used to read input, however, how is it initialized? Remember that the read-eval-print-loop is used, so I suspect this to be the culprit.

Inside main.c:

/*************************do_scheme_rep*****************************/

/*Finally, do a read-eval-print-loop*/

static void do_scheme_rep(Scheme_Env *env, FinishArgs *fa)

{

/* enter read-eval-print loop */

Scheme_Object *rep, *a[2];

int ending_newline = 1;

#ifdef GRAPHICAL_REPL

if (!fa->a->alternate_rep) {

a[0] = scheme_intern_symbol("racket/gui/init");

a[1] = scheme_intern_symbol("graphical-read-eval-print-loop");

ending_newline = 0;

} else

#endif

{

a[0] = scheme_intern_symbol("racket/base");

a[1] = scheme_intern_symbol("read-eval-print-loop");

}

rep = scheme_dynamic_require(2, a);

if (rep) {

scheme_apply(rep, 0, NULL);

if (ending_newline)

printf("\n");

}

}

Right right... so we intern two symbols, call dynamic require, and then apply it. What does apply mean and how does ‘scheme_dynamic_require‘ work? Let’s find out.

/* create symbol in symbol table unless a place local symbol table has been created */

/* once the first place has been create the symbol_table becomes read-only and

shouldn't be modified */

Scheme_Object *newsymbol;

Scheme_Hash_Table *create_table;

#if defined(MZ_USE_PLACES) && defined(MZ_PRECISE_GC)

create_table = place_local_table ? place_local_table : table;

#else

create_table = table;

#endif

newsymbol = make_a_symbol(name, len, kind);

/* we must return the result of this symbol bucket call because another

* thread could have inserted the same symbol between the first

* symbol_bucket call above and this one */

sym = symbol_bucket(create_table, name, len, newsymbol, type);

}

return sym;

}

Well this causes more confusion than I wanted. Apparently this gets a symbol table, creates a symbol bucket.

So back to ‘scheme_dynamic_require‘, the equivalent command being run is

(dynamic-require 'racket/base 'read-eval-print-loop)

This means that ‘read-eval-print-loop‘ is inside the ‘racket/base‘ namespace, and dynamic-require puts it into the base namespace. The returned scheme object ‘rep‘ is the repl function, which we apply. This is when the main REPL starts.

What I’d like to know is how the function is stored inside racket/base. Let’s go back to the initial list of functions.

It just gets the symbol associated with the object from ‘scheme_startup_env->all_primities_table‘. Here is the startup env in schpriv.h

/* A Scheme_Startup_Env holds tables of primitives */

struct Scheme_Startup_Env {

Scheme_Object so; /* scheme_startup_env_type */

Scheme_Hash_Table *current_table; /* used during startup */

Scheme_Hash_Table *primitive_tables; /* symbol -> hash table */

Scheme_Hash_Table *all_primitives_table;

Scheme_Hash_Table *primitive_ids_table; /* value -> integer */

};

We also find ‘make_startup_env‘ in env.c, which is called from ‘init_startup_env‘. Adding an fprintf does show that this is run, but from where is it called? from ‘scheme_basic_env‘, given to ‘run_from_cmd_line‘ from ‘main_after_stack‘ main.c

Then inside ‘run_from_cmd_line‘ we do:

/* Creates the main kernel environment */

global_env = mk_basic_env();

Great! and ‘global_env‘ is inside FinishedArgs, and indeed, ‘global_env‘ is used inside ‘finish_cmd_line_run‘.

scheme_init_process_globals();

scheme_init_true_false();

Not much interesting happens, a lock is made and some objects assigned.

scheme_init_symbol_table();

Let’s look at this one

symbol_table = init_one_symbol_table();

Not very interesting

init_startup_env();

OOOOH that looks more like it! Let’s go!

static void init_startup_env(void)

{

Scheme_Startup_Env *env;

#ifdef TIME_STARTUP_PROCESS

intptr_t startt;

#endif

REGISTER_SO(kernel_symbol);

kernel_symbol = scheme_intern_symbol("#%kernel");

I haven’t even looked further but I already know it’s going to be good.

Alright, interesting, so it binds `procedure_p` to the symbol "procedure?", I assume. Let's look at `procedure_p`.

static Scheme_Object *

procedure_p (int argc, Scheme_Object *argv[])

{

return (SCHEME_PROCP(argv[0]) ? scheme_true : scheme_false);

}

Oh, quite interesting, note how it takes in an array of pointers to ‘Scheme_Object‘. Perhaps this is the actual stack (instead of the heap). I wonder if simply making a single list of ‘Scheme_Object‘ would be faster. What’s also interesting is that the function itself doesn’t appear to check for argc length == 1. What does this? ‘SCHEME_PRIM_IS_UNARY_INLINED‘? Let’s try to remove the flag to see what happens. Well it doesn’t work. It still expects a single input.

So that’s that. Now I’d like to know how ‘scheme_make_folding_prim‘ works.

Scheme_Object *

scheme_make_folding_prim(Scheme_Prim *fun, const char *name,

mzshort mina, mzshort maxa,

short folding)

{

/* A folding primitive is an immediate primitive, and for constant

arguments the result must be the same on all runs and platforms. */

return make_prim_closure(fun, 1, name, mina, maxa,

(folding

? SCHEME_PRIM_OPT_FOLDING

: 0),

1, 1,

0, 0, NULL);

}

mina and maxa represent the max and minimal number of arguments.

o = scheme_make_folding_prim(procedure_p, "procedure?", 1, 1, 1);

There it is, the 1, 1 are mina and maxa respectively. The last value is "folding". What does this do? It sends a flag to ‘make_prim_closure‘. Let’s change the second number to 2 and see if we can give ‘procedure?‘ two arguments. And it works! Great.

So we now have an idea of how primitives - at least primitive functions - are implemented.

The other initialization function ‘scheme_init_list‘ does similar things but for ‘null‘, ‘pair?‘, and so on. ‘cons‘ for instance is very simple, a call to ‘scheme_make_pair‘ which simply creates a ‘Scheme_Object‘:

Only remaining is dynamic-require. The only place where dynamic-require is mentioned in C code is in cmdline.inc and in eval.c, both of which use ‘scheme_builting_value‘ or ‘scheme_get_startup_export‘. ‘scheme_startup_instance‘ is the environment that is used for this, let’s see how it is manipulated. From env.c: