Continuations: The cousin of closures. Unlike a closure you aren't simply calling a function, you are continuing a saved execution state. Basically though, if you have closures and tail calls as Haskell does, continuations can be built.

Yes, closures, thunks and continuations are all very similar. One implementation can capture them all, however the terminology is used to capture the different use cases.

Statistics of GHC

GHC is written in Haskell

Compiler: 227,000 lines of Haskell (including comments)

Libraries: 242,000 lines of Haskell (including comments)

The run-time system is written in C

87,000 lines of C

Started in 1989

23 developers contributed current release (7.4, in development since Aug 6th) with over 500 commits

Pipeline of GHC

Core

We will start though with a quick look at Core, the main intermediate language used by GHC:

Functional lazy language

It consists of only a hand full of constructs!

variables, literals, let, case, lambda abstraction, application

In general think, let means allocation, case means evaluation

For the curious, Core is technically a variant of a System FC (which is itself a variant of System F)

Basic idea of Core (and the various System <X> which are extensions of simple typed lambda calculus) is to be the smallest language needed to capture the source language. Easier to study, reason, optimize...

Some standard optimisations

A large set of simple, local optimisations (e.g constant folding) are done in one pass called the simplifier. It is run repeatedly until not further changes can be done (with a fixed maximum number of iterations).

These are only the basic, big win ones. All the other standard stuff (e.g strength reduction, loop induction...) are missing.

We get a lot of this for free though if we use the LLVM backend.

Rest of the optimisations GHC does are fairly specific to a functional language. Lets look at a few of them.

When entry code doesn't make sense, the code will either be code that simply returns or code that throws and error

Stack:

The stack consists of a sequence of frames

Each frame has the same layout as a heap object! So the stack and the heap can often be treated uniformily

Stacks until very recently were a single contiguous block of memory. They are now a linked list of stack chunks.

chunked stacks can be grown far easier but also are quicker to traverse during GC since we can avoid entire chunks of the stack if they haven't been touched since last GC.

TSO (thread state object):

Represents the complete state of a thread including it stack

Are ordinary objects that live in the heap

Important benefit of this approach is the GC can detect when a blocked thread is unreachable and so will never be runnable again

Terminology 105

activation record: An alternative name for a stack frame

forcing: In the context of a thunk it means evaluating it

entering: In the context of a closure it means evaluating it

node: Node in the context of the entry code for a closure is a pointer to the environment for the closure

Call Convention

GHC compiles code into a form called Continuation Passing Style:

The idea here is that no function ever returns

Instead a function returns by jumping to the closure at the top of the stack

Basically the code is always jumping from closure to closure so before calling a function we simply setup the stack correctly to have the control chain on it we want.

Call convention is simple: first n arguments in registers, rest on the stack

When entering a closure (a common case) the first argument is always a pointer to the closures heap object (node) so it can access its environment

Return convention is also simple, return is made by jumping to the entry code associated with the info table of the topmost stack frame OR in some cases we set the R1 register to point to the return closure

Call Convention

Here we don't call the function directly as we don't statically known the arity of the function.

To deal with this, the STG machine has several pre-compiled functions that handle 'generic application'

Generic application has three cases to deal with:

The functions arity and number of arguments match! So we simply make a tail call to the functions entry code.

The functions arity is greater than the number of supplied argumnts. In this case we build a PAP closure and return that closure to the continuation at the top of the stack

The functions arity is less than the number of supplied arguments. Here we push the number of arguments matching the functions arity onto the stack, followed by a new continuation that uses another generic apply function to deal with the remaining arguments and the function that should be returned by the first function.

RTS & Garbage Collection

Basic idea of a generational collector is to divide objects up into generations (time they've been alive) since young objects have a higher probability of becoming garbage. We can now just GC one generation at a time in an incremental fashion to speed up GC.

Basic idea of a copy collectors is you have two heaps, one is the current heap and during GC you start with a list of known live objects ("roots"), recurisvely trace their dependencies (finding live objects) and copy all found objects to the other heap. Anything not copied isn't referenced by anything so is dead. Now switch heaps.

RTS & Garbage Collection

Uses a linked list of blocks where within a block we allocate using a simple bump pointer (heap and stack mangaged this way)

Bump pointer is where we simply have a current block to allocate with a pointer to the next free space to allocate in. To allocate we check there is enough space left in the block and if so bump the pointer

Block size is chosen such that it's rare we need to allocate an object larger than a block