Garbage Collection in C Programs

LISP and Java programmers take garbage collection for granted. With the Boehm-Demers-Weiser library, you easily can use it in C and C++ projects, too.

into the appropriate include files. This code substitutes only the explicit
calls contained in your code, leaving startup and library allocations
to traditional malloc/free calls.

A different approach is to hook malloc and friends to functions of your
own, which in turn would call the GC versions. Listing 1
shows how to do it, and it can be linked directly to an existing program. See
my article “Advanced Memory Allocation”
[LJ, May 2003] for details on these hooks. With this
method, any heap allocation is guaranteed to go through libgc, even if it is
not performed directly by your code.

As a third alternative, you can pass --enable-redirect-malloc to
configure before compiling the libgc library. Doing so
provides the library with wrapper functions that have the same names as the standard glibc
malloc family. When linking with your code, the functions in libgc
override the standard ones, with a net effect similar to using malloc
hooks. In this case, though, the effect is system-wise, as any program
linked with libgc is affected by the change.

Do not expect to endow huge programs with GC easily using any of these
methods. Some simple tricks are needed in order to exploit GC functions
and help the collector algorithm work efficiently. For example,
I tried to recompile gawk (version 3.1.1) using GC and obtained
an executable ten times slower than the original. With some adjustments,
such as setting each pointer to NULL after having freed it, the execution
time improved significantly, even if it was still greater than the
original time.

Garbage Collection in New Programs

If you are developing a new program and would like to take advantage
of automated memory management, all you need to do is use the
GC_malloc() family in place of the plain malloc() one and link with
libgc. Memory blocks no longer needed simply can be disposed
of by setting any referencing pointers to NULL. Alternatively, you can call
GC_free() to free the block immediately.

Always remember that your whole heap is scanned periodically by
the collector to look for unused blocks. If the heap is large, this
operation may take some time, causing the performance of the program to
degrade. This behavior is suboptimal, because large blocks
of memory often are guaranteed never to contain pointers, including buffers
used for file or network I/O and large strings. Typically, pointers are
contained only in fixed positions within small data structures, such
as list
and tree nodes. Were C and C++ strongly typed languages,
the collector could have decided whether to scan
a memory block, based on the type of pointer. Unfortunately,
this is not possible because it is perfectly legal in C to have a char
pointer reference a list node.

For optimal performance, the programmer should try
to provide some basic runtime type information to the collector.
To this end, the BDW library has a set of alternative functions
that can be used to allocate memory. GC_malloc_atomic() can be used
in place of GC_malloc() to obtain memory blocks that will never contain
valid pointers. That is, the collector skips those blocks when looking
for live memory references. Furthermore, those blocks do not need
to be cleared on allocation. GC_malloc_uncollectable() and
GC_malloc_stubborn() also can be used to allocate fixed and rarely
changing blocks, respectively. Finally, it is possible to provide
some rough type information by using GC_malloc_explicitly_typed()
and building block maps with GC_make_descriptor().
See gc_typed.h on the Linux Journal FTP site for more
information [available at ftp.linuxjournal.com/pub/lj/listings/issue113/6679.tgz].

The collector's behavior also can be controlled by the user through a
number of function calls and variables. Among the most useful ones are
GC_gcollect(), which forces a full garbage collection on the whole
heap; GC_enable_incremental(), which enables incremental mode
collection; and GC_free_space_divisor, which tunes
the trade-off between frequent collections (high values, causing low
heap expansion and high CPU overhead) and time efficiency (low values).

Heap status and debug information is available through a number of
functions, including GC_get_heap_size(), GC_get_free_bytes(),
GC_get_bytes_since_gc(), GC_get_total_bytes() and
GC_dump(). Many of these parameters and functions are not documented at all,
not even in the source code itself. As always, a good editor is your friend.