Personal tools

Views

Yhc/RTS/Heap

From HaskellWiki

The Yhc heap is a garbage collected area of memory in which the running program introduces heap nodes. In this document I will try to use 'heap' to exclusively mean this garbage collected part of the heap. Any other memory used by Yhc will be refered to as 'memory'.

The node flags (fl) occupy bits 0 and 1 of the header, with the rest taken by the info ptr.

1.1 Node flags

To understand the meaning of the node flags it's necessary to consider the life cycle of a heap node. There are two types of heap nodes 'constructors'(CON) and 'applications'. Applications are either partial applications (PAP) or thunks awaiting evaluation (THUNK). Constructors are simple data items created by the program.

Constructors have a very simple life cycle, they are introduced into the heap and eventually they are no longer needed and are garbage collected. Applications are more complicated:

When an application is introduced into the heap it is either a partial application or a thunk.

If it is a partial application it may be applied to new arguments to give a new heap node with the additional arguments. If it is a thunk it may be evaluated.

When a thunk is evaluated it becomes a black hole. The purpose of this is cycle detection: if a black hole is evaluated then the application must be dependent on its own value and thus the program wouldn't terminate.

After evaluation a thunk is updated with an indirection to its result (either a constructor or another application).

Eventually the garbage collector will run and remove the indirections removing the application from the heap.

The node flags of a node reflect this life cycle: there are four possible values for the 2 flag bits, indicating the different states of the node.

0 0 IND - This node is an indirection, 'info ptr' points to the result.

0 1 NORMAL - This is a normal node, 'info ptr' points to the info table.

1 1 HOLE - This is a heap node which is currently under evaluation, 'info ptr' points to info table.

Ignoring the GC flag, a constructor always has 'NORMAL' flag and an application goes from NORMAL -> HOLE -> IND.

1.2 Node Info

Every heap node which is a 'NORMAL' or a 'HOLE' has an info pointer. This pointer points to information (stored in memory, not the heap) about the heap node (representing by the structure Info in node.h). Most importantly it is the Info structure which identifies the type of the heap node: CON, THUNK or PAP.

Concurrency slightly complicates the issue here because for nodes which are "blocked" the info pointer may point to a list of processes
chained together (with the original info at the end of the chain). These are the processes which are waiting on the result of the node being evaluated.

The Info structure for a Node is defined as.

struct _Info {
HUInt tag;
};

Tag is a half-word indicating what type of info this is, there are four possible values:

CINFO information about a constructor (node is a CON)

PINFO information about an application (node is THUNK/PAP)

FINFO information about a function (no node will have this info directly, it'll be a PAP instead)

XINFO information about an external function (again, no node will have this info directly)

If the tag of an 'info ptr' is a CINFO that the info ptr actually points to a CInfo structure, if the tag is PINFO it points to a PInfo structure, etc.

Which function this is an application to is not encoded directly in the PInfo structure and is instead stored by laying out things careful in memory. The information about a function (both PInfo and FInfo structures) is always layed out as:

Every function has a constant table which holds references to global objects that can be referenced by a function. For example a function such as:

f = g (1 : [])

would need a reference to the FInfo for 'g', as well as the CInfo for ':' and the heap node that represents '[]'. These references are stored in the constant table.

There are two types of object in the constant table: references to nodes in the heap (C_NODE) and references to other Info structures (C_INFO). The types of each constant are given by the bytes of 'constTypes': each byte
gives the type of consequetive constant table items.

The constTable itself is an array of ConstItems which are pointers to either heap nodes or Info structures (obviously depending on the constant type).

NOTE: it is by recursively scanning the constant table of a functions that the module loading system knows what entries to load from the bytecode file (see Yhc/RTS/Modules)

1.2.4 XInfo

XInfo is very similar to a FInfo except it refers to an external (i.e. primitive or FFI) function. The only difference is that instead of a code pointer and constant table it has a C function pointer to the external function. The function is assumed to be of the form: