Introduction

The inspiration of writing this code came from a paper titled "A Real Time Garbage
Collector Based on the Lifetimes of Objects" by H. Liberman and C. Hewitt. In the paper,
the authors have described a heap storage framework that makes storage for short-lived objects
cheaper than storage for long-lived objects. Although the algorithm was presented for LISP and
similar systems, the same concept is implemented in C++.
The paper is available at:
http://lieber.www.media.mit.edu/people/lieber/Lieberary/GC/Realtime/Realtime.html

Dynamic memory management is often referred to as making memory requests from the
operating system during different courses of program execution. All dynamic memory
requests are satisfied through an area of memory called heap. In C++ this is
done through the new operator. The operator is implemented by a call to
malloc,
which grants a pointer to the object after allocating its memory on the heap. However
this flexibility of acquiring memory dynamically comes at a price: i.e. it becomes the
responsibility of the programmer to return dynamically allocated memory to the free
pool, by using the delete operator. The delete operator in turn calls
free, which
reclaims memory allocated on the heap. When delete has been called, the object is
destroyed and its destructor has been called. Further access to the pointer data can cause
unpredictable results.

The term "Garbage Collection" is an automated process of finding previously allocated memory
that is no longer reachable by the program and then regaining that memory for future use.
The garbage collector does this by several ways, one of which is traversing all pointers
on the heap and finding weak pointers (pointer that allows the object memory to be recovered).
In simple terms, use of Garbage Collectors leverages the programmer of worrying about
calling delete every time new is called. Automated Garbage Collectors can reduce development
cycles for a large-scale software by approximately 30% and additionally reduce the memory leaks,
resulting in a more stable system.

Some systems also use reference counting for implementing garbage collection, however they have
unnerving disadvantages of their own:

The inability to reclaim circular structures i.e. circular structures can have non-zero
counts, even when garbage.

Often results in memory fragmentation.

Its expensive since every allocation/freeing requires addition/subtraction.

Due to the above-mentioned problems, it is not a viable option to use reference counting as
a primary answer to memory management problems especially when program code begins to increase.
Nevertheless, there have been very few implementations of garbage collectors available in
the public domain. One of them is by Silicon Graphics. This is intended to be a general purpose,
garbage collecting storage allocator for gcc compiler. The algorithms used are described in:

Design Approach

The heap that this garbage collector maintains is made up of several generations. The user
creates pointers by a wrapper pointer class Pointer<T>. Once memory allocation
request is made, the garbage collector returns a pointer to the object (created in its own heap space)
and also records its address (remember the pointer is itself created on the stack) in a
vector<void**>. Similarly the size of object is also recorded for future generational copying.
Once the wrapper pointer runs out of scope, or a pointer assignment is made, the garbage collection
algorithm is run, to verify integrity of all pointers. The algorithm of garbage collection involves
moving all accessible objects out of current generation,
evacuating them to another generation and then iterating the heap for resolving all pointers to the
object that has been recently relocated. Finally the memory of current generation is reclaimed. If there
is any unreachable data in the generation, it is also recovered.

The garbage collector functionality is implemented by a class named GC. It has all static member
functions. At any time during program execution, the process of garbage collection can be forcefully initiated by a
call to GC::Collect(). The maximum number of generations can be queried by a call to
GC::GetMaxGeneration(). If you are curious to find out the total number of bytes allocated on
the memory managed by the garbage collector, you can call GC::GetTotalBytesAllocated().

class GC
{
private:
//Array of pointers to pointers that are made on the stack static std::vector< void** > _PointersOnStack;
// Holds the size of objects that are made on the stack static std::vector<unsigned int> _SizeOfObjects;
//Holds all the generations static std::vector< Generation* > _Generations;
// Holds total bytes allocated on the heapstaticint BytesAllocated;
public:
// Invokes the GC for all generationsstaticvoid Collect();
// Invokes the GC only upto and including the generation specifiedstaticvoid Collect( Generation*& pGeneration );
// Call this to allocate memory from the garbage collectorstatic void* operatornew( size_t, void** pStackPtr );
// Gets maximum number of generations that have been madestaticint GetMaxGeneration();
// Gets the total memory (bytes) that has been allocated on the heapstaticint GetTotalBytesAllocated() { return BytesAllocated; }
// Returns the total number of generations in the GCstaticint GetGenerationCount() { return _Generations.size(); }
// Sets the total bytes that have been allocated by the garbage collectorstaticvoid SetTotalBytesAllocated( int Value ) { BytesAllocated = Value; }
};

The
pointers that are iterated during the garbage collection process must point to objects of type Pointer<T>.
This class implements the functionality of smart pointers. It overloads several operators including assignment
operator and automatic conversion operators. Garbage collection process is invoked whenever either the Pointer
object runs out of scope or an assignment is made.

The various generations of heap are implemented by a class Generation. Each generation wraps a table
of contiguous memory locations, so by having newly created objects close
together you can have fewer page faults and the objects will also reside in the
processor cache. There are many generations and generation with is the highest
Generation number contains the objects most recently created. Each generation
has certain capacity and when the objects on heap overrun that capacity, a new
generation is automatically created. The process of condemning a generation and
collecting memory lost as a result of weak references is called scavenging.

How to use it

Objects allocated with the built-in "::operator new" are uncollectable. Only objects allocated with overloaded
new operator that takes address of pointer as the second argument are collectable. For ease of programming,
I have also written automatic conversion function of Pointer to void**:

Run the demo program and see memory allocated in the task manager of WinNT/2k. You would observe considerable
difference in performance of the system with and without garbage collector. In order to prove the quality
of garbage collection, I have also overloaded global new and delete operator to increment and decrement
memory allocation counter. This counter can be a valuable indicator for detecting memory leaks in program.

Earlier I had many problems with a call to virtual functions in pointers allocated on GC's heap, but now all
those have been solved. Right now memory allocation for arrays of objects has not been implemented, but that would
be done in near future. I would love to have any suggestions for improving the design and functionality
of the garbage collector.

Revision History

26 Jun 2002 - Resized and reformatted code to prevent scrolling

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Comments and Discussions

You don't comment on still having to call delete to run the object's destructor. I have no idea if your implementation works or is any good but any discussion of GC for C that says you don't have to delete objects just because you clear up memory after them gives me the willies.

Destructors are important in far more things than recovering memory of member objects, they're a general purpose tool for managing resources.

Oh and...

- new and delete don't have to call malloc and free - at least one compiler I use calls _sbrk to grab more memory and doesn't go anywhere near the C heap
- traditional C++ practice has been for managing resources has been:

1_ the std:vector stackObjects and objectSizes never ever decrease in size. Since the Pointer<t> destructor doesn't remove itself from such lists, they grow, accumulating more and more dead pointers (or even more dangerously, reallocated at the same stack location in a different call chain)

2_ It completely ignores data members which are very likely candidates! If you need a Foo * at some point, chances are that you're going to store it into some feld of your object ...

3_ The killer ... Because it has no root set (only a "stackObject" list) this implementation does create unreachable cycleslike reference counting does! With a root set, you can mark/reach all the active objects, and deem "the rest" unreachable thus collectable. With this "stackObject" construt every single object ever constructed(and then some! see 1_ above!!)is reachable, so if Pointer<T1> x1 refers to Pointer<K1> y1 that refers back to x1, then both are reachable from the "stackObject" list, both refer to each other, yet there might not be any other reference anywhere else ...

The icing of the cake, of course, is that ownership is not even correctly dealt with!don't try this at home!

Pointer<X>
getX() {
Pointer<X> a = new(a) X ;
return a ;
}

But there's a positive side, the paper referred to at the beginning of the article, is worth the read, though the implementation, (like this C++ version) is not utterly realistic (think a minute about what the word "pointer" means and how you could "point" to areas/generations rather than copying!)

Well, the code presented here is not a generational garbage collector (as everything is collected everytime collection is invoked), it's just a copying garbage collector, and in fact memory is not reclaimed until whole generation is emptied (i.e. one stale object prevents freeing the memory)

Did you cehck up with the performance.Don't you think this will have a performance issue if this is being used in a large projects with lots of new and delete. Do you have ny other alternatives in case ?

I have no idea if this implementation even works, but I've found that using garbage collection (e.g. Hans Boehm's collector) can actually speed up C++ programs that use a lot of threads. There's less contention for the heap, less blocking and more speed.

Having said that it doesn't half snarf memory. The one big project I worked on which used GC ate about 3 times the virtual memory and when the collector actually kicked in there was a flurry of paging on low end machines. It sounded like someone machine-gunning my disk drive.

You should probably check that the target object in the implementation of Pointer<>::operator= is not itself for a quick and simple optimisation.

Also, I personally prefer to use a namespace containing functions rather than a class with only static methods. The usage and syntax is identical, but I feel that the subtle semantic distinction is worthwhile. I put what would otherwise be private static member variables into an anonymous namespace within the .cpp, and thus hide the implementation from users of the class.

I've already submitted an article on GC in C++ using smart pointers. The implementation used there was a simple mark-and-sweep instead of generational, but there's some design decisions that I think are better then what's presented here. For instance, the syntax:

Pointer<int> pInt = new(pInt) int;

is not very intuitive. The syntax used by gc_ptr is a bit nicer:

gc_ptr<int> p(new(gc) int);

However, there's a flaw in overloading new that was pointed out to me by several users of gc_ptr. Many MFC types don't work here because they overload operator new causing ambiguities. For instance:

gc_ptr<CStringArray> p(new(gc) CStringArray);

fails with: error C2665: 'new' : none of the 3 overloads can convert parameter 2 from type 'struct gc_detail::gc_t'. I've not had the time to hack around with this to know if there's a ready solution, but if there's not it may be better to go with syntax like:

gc_ptr<CStringArray> p(gc::allocate<CStringArray>());

(Note that this would require several templated overloads to allocate() to handle construction of objects with parameters.)

Another thing to consider is when to call the garbage collector. With gc_ptr this is configurable in several ways. Read the article for info on this.

It might be nice to collaborate on a generational form of gc_ptr if you're interested. If so, send me private e-mail.