Navigation

Have you written the lexer, parser, code generator and the runtime
system for your programming language, and come to the realization that
you are going to need a memory manager too? If so, you’ve come to the
right place.

In this guide, I’ll explain how to use the MPS to add incremental,
moving, generational garbage collection to the runtime system for a
programming language.

As a running example throughout this guide, I’ll be using a small
interpreter for a subset of the Scheme programming language.
I’ll be quoting the relevant sections of code as needed, but you may
find it helpful to experiment with this interpreter yourself, in either
of its versions:

Each of these types is a structure whose first word is a number
specifying the type of the object (TYPE_PAIR for pairs,
TYPE_SYMBOL for symbols, and so on). For example, pairs are
represented by a pointer to the structure pair_s defined as
follows:

typedefstructpair_s{type_ttype;/* TYPE_PAIR */obj_tcar,cdr;/* first and second projections */}pair_s;

Because the first word of every object is its type, functions can
operate on objects generically, testing TYPE(obj) as necessary
(which is a macro for obj->type.type). For example, the
print() function is implemented like this:

staticvoidprint(obj_tobj,unsigneddepth,FILE*stream){switch(TYPE(obj)){caseTYPE_INTEGER:
fprintf(stream,"%ld",obj->integer.integer);break;caseTYPE_SYMBOL:
fputs(obj->symbol.string,stream);break;/* ... and so on for the other types ... */}}

Each constructor allocates memory for the new object by calling
malloc. For example, make_pair is the constructor for pairs:

staticobj_tmake_pair(obj_tcar,obj_tcdr){obj_tobj=(obj_t)malloc(sizeof(pair_s));if(obj==NULL)error("out of memory");obj->pair.type=TYPE_PAIR;CAR(obj)=car;CDR(obj)=cdr;returnobj;}

Objects are never freed, because it is necessary to prove that they
are dead before their memory can be reclaimed. To
prove that they are dead, we need a tracinggarbage collector, which the MPS will provide.

You’ll recall from the Overview of the Memory Pool System that the functionality of
the MPS is divided between the arenas, which request memory
from (and return it to) the operating system, and pools, which
allocate blocks of memory for your program.

The client arena is intended for use on embedded systems where there
is no virtual memory, and has a couple of disadvantages (you have to
decide how much memory you are going to use; and the MPS can’t return
memory to the operating system for use by other processes) so for
general-purpose programs you’ll want to use the virtual memory arena.

You’ll need a couple of headers: mps.h for the MPS interface, and
mpsavm.h for the virtual memory arena class:

#include "mps.h"#include "mpsavm.h"

There’s only one arena, and many MPS functions take an arena as an
argument, so it makes sense for the arena to be a global variable
rather than having to pass it around everywhere:

staticmps_arena_tarena;

Create an arena by calling mps_arena_create(). This function
takes a third argument when creating a virtual memory arena: the size of
the amount of virtual virtual address space (notRAM),
in bytes, that the arena will reserve initially. The MPS will ask for
more address space if it runs out, but the more times it has to extend
its address space, the less efficient garbage collection will become.
The MPS works best if you reserve an address space that is several times
larger than your peak memory usage.

mps_arena_create() is typical of functions in the MPS
interface in that it stores its result in a location pointed to by an
out parameter (here, &arena) and returns a result
code, which is MPS_RES_OK if the function succeeded, or
some other value if it failed.

Note

The MPS is designed to co-operate with other memory managers, so
when integrating your language with the MPS you need not feel
obliged to move all your memory management to the MPS: you can
continue to use malloc and free to manage some of your
memory, for example, while using the MPS for the rest.

The toy Scheme interpreter illustrates this by continuing to use
malloc and free to manage its global symbol table.

The section Choosing a pool class in the Pool reference contains a procedure
for choosing a pool class. In the case of the toy Scheme interpreter,
the answers to the questions are (1) yes, the MPS needs to
automatically reclaim unreachable blocks; (2) yes, it’s acceptable for
the MPS to move blocks in memory and protect them with barriers(1); and (3) the Scheme objects will contain exact references
to other Scheme objects in the same pool.

In order for the MPS to be able to automatically manage your objects,
you need to tell it how to perform various operations on an object
(scan it for references; replace it with a
forwarding or padding object, and
so on). You do this by creating an object format. Here’s the
code for creating the object format for the toy Scheme interpreter:

The structure mps_fmt_A_s is the simplest of several object
format variants that are appropriate for moving pools like AMC.

The first element of the structure is the alignment of objects
belonging to this format. Determining the alignment is hard to do
portably, because it depends on the target architecture and on the way
the compiler lays out its structures in memory. Here are some things
you might try:

Some modern compilers support the alignof operator:

#define ALIGNMENT alignof(obj_s)

On older compilers you may be able to use this trick:

#define ALIGNMENT offsetof(struct {char c; obj_s obj;}, obj)

but this is not reliable because some compilers pack structures
more tightly than their alignment requirements in some
circumstances (for example, GCC if the -fstruct-pack option is
specified).

The MPS interface provides the type mps_word_t, which is
an unsigned integral type that is the same size as the platform’s
object pointer types.

So if you know that all your objects can be word-aligned, you can
use:

#define ALIGNMENT sizeof(mps_word_t)

The other elements of the structure are the format methods,
which are described in the following sections. (The NULL in the
structure is a placeholder for the copy method, which is now
obsolete.)

The scan method is a function of type
mps_fmt_scan_t. It is called by the MPS to scan a
block of memory. Its task is to identify all references within the
objects in the block of memory, and “fix” them, by calling the macros
MPS_FIX1() and MPS_FIX2() on each reference (possibly
via the convenience macro MPS_FIX12()).

“Fixing” is a generic operation whose effect depends on the context in
which the scan method was called. The scan method is called to
discover references and so determine which objects are alive and which are dead, and also to update references
after objects have been moved.

Here’s the scan method for the toy Scheme interpreter:

staticmps_res_tobj_scan(mps_ss_tss,mps_addr_tbase,mps_addr_tlimit){MPS_SCAN_BEGIN(ss){while(base<limit){obj_tobj=base;switch(TYPE(obj)){caseTYPE_PAIR:
FIX(CAR(obj));FIX(CDR(obj));base=(char*)base+ALIGN(sizeof(pair_s));break;caseTYPE_INTEGER:
base=(char*)base+ALIGN(sizeof(integer_s));break;/* ... and so on for the other types ... */default:assert(0);fprintf(stderr,"Unexpected object on the heap\n");abort();}}}MPS_SCAN_END(ss);returnMPS_RES_OK;}

The scan method receives a scan state (ss) argument, and
the block of memory to scan, from base (inclusive) to limit
(exclusive). This block of memory is known to be packed with objects
belonging to the object format, and so the scan method loops over the
objects in the block, dispatching on the type of each object, and then
updating base to point to the next object in the block.

For each reference in an object obj_scan fixes it by calling
MPS_FIX12() via the macro FIX, which is defined as
follows:

When the MPS calls your scan method, it may be part-way through
moving your objects. It is therefore essential that the scan
method only examine objects in the range of addresses it is
given. Objects in other ranges of addresses are not guaranteed
to be in a consistent state.

Scanning is an operation on the critical path of the
MPS, which means that it is important that it runs as quickly
as possible.

If your reference is tagged, you
must remove the tag before fixing it. (This is not quite true,
but see Tagged references for the full story.)

The “fix” operation may update the reference. So if your
reference is tagged, you must make sure that the tag is
restored after the reference is updated.

The “fix” operation may fail by returning a result code
other than MPS_RES_OK. A scan function must
propagate such a result code to the caller, and should do so as
soon as practicable.

The skip method is a function of type
mps_fmt_skip_t. It is called by the MPS to skip over an
object belonging to the format, and also to determine its size.

Here’s the skip method for the toy Scheme interpreter:

staticmps_addr_tobj_skip(mps_addr_tbase){obj_tobj=base;switch(TYPE(obj)){caseTYPE_PAIR:
base=(char*)base+ALIGN(sizeof(pair_s));break;caseTYPE_INTEGER:
base=(char*)base+ALIGN(sizeof(integer_s));break;/* ... and so on for the other types ... */default:assert(0);fprintf(stderr,"Unexpected object on the heap\n");abort();}returnbase;}

The argument base is the address to the base of the object. The
skip method must return the address of the base of the “next object”:
in formats of variant A like this one, this is the address just past
the end of the object, rounded up to the object format’s alignment.

The forward method is a function of type
mps_fmt_fwd_t. It is called by the MPS after it has moved an
object, and its task is to replace the old object with a
forwarding object pointing to the new location of the object.

Copying garbage collection.

The forwarding object must satisfy these properties:

It must be scannable and skippable, and so it will need to have a
type field to distinguish it from other Scheme objects.

The scan method and the skip method will both need to know the length of the
forwarding object. This can be arbitrarily long (in the case of
string objects, for example) so it must contain a length field.

This poses a problem, because the above analysis suggests that
forwarding objects need to contain at least three words, but Scheme
objects might be as small as two words (for example, integers).

This conundrum can be solved by having two types of forwarding object.
The first type is suitable for forwarding objects of three words or
longer:

Objects that consist of a single word present a problem for the
design of the forwarding object. In the toy Scheme interpreter, this
happens on some 64-bit platforms, where a pointer is 8 bytes long,
and a character_s object (which consists of a 4-byte int
and a 1-byte char) is also 8 bytes long.

There are a couple of solutions to this problem:

Allocate the small objects with enough padding so that they can
be forwarded. (This is how the problem is solved in the toy
Scheme interpreter.)

Use a tag to distinguish between the client object and
a forwarding object that replaces it. It might help to allocate
the small objects in their own pool so that the number of types
that the scan method has to distinguish is minimized. Since
these objects do not contain references, they could be
allocated from the AMCZ (Automatic Mostly-Copying Zero-rank) pool, and so the cost of
scanning them could be avoided.

A padding object must be scannable and skippable, and not confusable
with a forwarding object. This means they need a type and a
size. However, padding objects might need to be as small as the
alignment of the object format, which was specified to be a single
word. As with forwarding objects, this can be solved by having two
types of padding object. The first type is suitable for padding
objects of two words or longer:

typedefstructpad_s{type_ttype;/* TYPE_PAD */size_tsize;/* total size of this object */}pad_s;

while the second type is suitable for padding objects consisting of a
single word:

You create a generation chain by constructing an array of structures
of type mps_gen_param_s, one for each generation, and
passing them to mps_chain_create(). Each of these structures
contains two values, the capacity of the generation in
kilobytes, and the mortality, the proportion of
objects in the generation that you expect to survive a collection of
that generation.

These numbers are hints to the MPS that it may use to make decisions
about when and what to collect: nothing will go wrong (other than
suboptimal performance) if you make poor choices. Making good choices
for the capacity and mortality of each generation is not easy, and is postponed to the chapter Tuning the Memory Pool System for performance.

Here’s the code for creating the generation chain for the toy Scheme
interpreter:

Note that these numbers have have been deliberately chosen to be
small, so that the MPS is forced to collect often, so that you can see
it working. Don’t just copy these numbers unless you also want to see
frequent garbage collections!

The object format tells the MPS how to find references from one object to another. This allows the MPS to
extrapolate the reachability property: if object A is
reachable, and the scan method fixes a reference from
A to another object B, then B is reachable too.

But how does this process get started? How does the MPS know which
objects are reachable a priori? Such objects are known as
roots, and you must register them with the MPS,
creating root descriptions of type mps_root_t.

In the case of the toy Scheme interpreter, the root scanning function
for the special objects and the predefined symbols could be written
like this:

staticmps_res_tglobals_scan(mps_ss_tss,void*p,size_ts){MPS_SCAN_BEGIN(ss){FIX(obj_empty);/* ... and so on for the special objects ... */FIX(obj_quote);/* ... and so on for the predefined symbols ... */}MPS_SCAN_END(ss);returnMPS_RES_OK;}

but in fact the interpreter already has tables of these global
objects, so it’s simpler and more extensible for the root scanning
function to iterate over them:

The fourth argument is the root mode, which tells the MPS
whether it is allowed to place a barrier(1) on the root. The
root mode 0 means that it is not allowed.

The sixth and seventh arguments (here NULL and 0) are passed
to the root scanning function where they are received as the
parameters p and s respectively. In this case there was no
need to use them.

What about the global symbol table? This is trickier, because it gets
rehashed from time to time, and during the rehashing process there are
two copies of the symbol table in existence. Because the MPS is
asynchronous, it might be
scanning, moving, or collecting, at any point in time, and if it is
doing so during the rehashing of the symbol table it had better scan
both the old and new copies of the table. This is most conveniently
done by registering a new root to refer to the new copy, and then
after the rehash has completed, de-registering the old root by calling
mps_root_destroy().

It would be possible to write a root scanning function of type
mps_reg_scan_t, as described above, to fix the references in
the global symbol table, but the case of a table of references is
sufficiently common that the MPS provides a convenient (and optimized)
function, mps_root_create_table(), for registering it:

The root must be re-registered whenever the global symbol table
changes size:

staticvoidrehash(void){obj_t*old_symtab=symtab;unsignedold_symtab_size=symtab_size;mps_root_told_symtab_root=symtab_root;unsignedi;mps_addr_tref;mps_res_tres;symtab_size*=2;symtab=malloc(sizeof(obj_t)*symtab_size);if(symtab==NULL)error("out of memory");/* Initialize the new table to NULL so that "find" will work. */for(i=0;i<symtab_size;++i)symtab[i]=NULL;ref=symtab;res=mps_root_create_table(&symtab_root,arena,mps_rank_exact(),0,ref,symtab_size);if(res!=MPS_RES_OK)error("Couldn't register new symtab root");for(i=0;i<old_symtab_size;++i)if(old_symtab[i]!=NULL){obj_t*where=find(old_symtab[i]->symbol.string);assert(where!=NULL);/* new table shouldn't be full */assert(*where==NULL);/* shouldn't be in new table */*where=old_symtab[i];}mps_root_destroy(old_symtab_root);free(old_symtab);}

Notes

The old root description (referring to the old copy of the
symbol table) is not destroyed until after the new root
description has been registered. This is because the MPS is
asynchronous: it might
be scanning, moving, or collecting, at any point in time. If
the old root description were destroyed before the new root
description was registered, there would be a period during
which:

the symbol table was not reachable (at least as far as the
MPS was concerned) and so all the objects referenced by it
(and all the objects reachable from those objects) might
be dead; and

if the MPS moved an object, it would not know that the
object was referenced by the symbol table, and so would not
update the reference there to point to the new location of
the object. This would result in out-of-date references in
the old symbol table, and these would be copied into the new
symbol table.

The root might be scanned as soon as it is registered, so it is
important to fill it with scannable references (NULL in
this case) before registering it.

The order of operations at the end is important: the old root
must be de-registered before its memory is freed.

In order to scan the control stack, the MPS needs to know where the
bottom of the stack is, and that’s the role of the marker
variable: the compiler places it on the stack, so its address is a
position within the stack. As long as you don’t exit from this
function while the MPS is running, your program’s active local
variables will always be higher up on the stack than marker, and
so will be scanned for references by the MPS.

staticobj_tmake_pair(obj_tcar,obj_tcdr){obj_tobj;mps_addr_taddr;mps_res_tres;res=mps_alloc(&addr,pool,sizeof(pair_s));if(res!=MPS_RES_OK)error("out of memory in make_pair");obj=addr;/* What happens if the MPS scans obj just now? */obj->pair.type=TYPE_PAIR;CAR(obj)=car;CDR(obj)=cdr;returnobj;}

Because the MPS is asynchronous, it might scan any reachable object at any time, including
immediately after the object has been allocated. In this case, if the
MPS attempts to scan obj at the indicated point, the object’s
type field will be uninitialized, and so the scan method
may abort.

staticobj_tmake_pair(obj_tcar,obj_tcdr){obj_tobj;mps_addr_taddr;size_tsize=ALIGN(sizeof(pair_s));do{mps_res_tres=mps_reserve(&addr,obj_ap,size);if(res!=MPS_RES_OK)error("out of memory in make_pair");obj=addr;obj->pair.type=TYPE_PAIR;CAR(obj)=car;CDR(obj)=cdr;}while(!mps_commit(obj_ap,addr,size));returnobj;}

The function mps_reserve() allocates a block of memory that
the MPS knows is uninitialized: the MPS promises not to scan this
block or move it until after it is committed(2) by calling
mps_commit(). So the new object can be initialized safely.

However, there’s a second problem:

CAR(obj)=car;CDR(obj)=cdr;/* What if the MPS moves car or cdr just now? */}while(!mps_commit(obj_ap,addr,size));

Because obj is not yet committed, the MPS won’t scan it, and that
means that it won’t discover that it contains references to car
and cdr, and so won’t update these references to point to their
new locations.

In such a circumstance (that is, when objects have moved since you
called mps_reserve()), mps_commit() returns false, and
we have to initialize the object again (most conveniently done via a
while loop, as here).

Notes

When using the Allocation point protocol it is up
to you to ensure that the requested size is aligned, because
mps_reserve() is on the MPS’s critical path,
and so it is highly optimized: in nearly all cases it is just
an increment to a pointer and a test.

It is very rare for mps_commit() to return false, but
in the course of millions of allocations even very rare events
occur, so it is important not to do anything you don’t want to
repeat between calling mps_reserve() and
mps_commit(). Also, the shorter the interval, the less
likely mps_commit() is to return false.

The MPS is asynchronous:
this means that it might be scanning, moving, or collecting, at any
point in time (potentially, between any pair of instructions in your
program). So you must make sure that your data structures always obey
these rules:

A root must be scannable by its root scanning function as
soon as it has been registered.

When your program is done with the MPS, it’s good practice to tear
down all the MPS data structures. This causes the MPS to check the
consistency of its data structures and report any problems it
detects. It also causes the MPS to flush its telemetry stream.

MPS data structures must be destroyed or deregistered in the reverse
order to that in which they were registered or created. So you must
destroy all allocation points created in a
pool before destroying the pool; destroy all roots and pools, and deregister all threads, that
were created in an arena before destroying the arena, and so
on.

But in the more likely event that things don’t work out quite as
smoothly for your language as they did in the toy Scheme interpreter,
then you’ll be more interested in the chapter Debugging with the Memory Pool System.