Introduction

One of the biggest complaints that is leveled at C++ is its lack of garbage
collection. Programmers used to more modern languages such as Java find that
programming in C++ is difficult and error prone because you have to manually
manage memory. To some extent they are right, though most die hard C++
programmers would hate to give in entirely to a garbage collecting system.

The idea behind garbage collection is that a runtime system will manage
memory for you. When you need a new object you explicitly create it, but you
never have to worry about destroying it. When there are no more references
left to the object it is eventually automatically destroyed by the runtime
system. This makes complex systems that must dynamically create objects
frequently much easier to code, since you never have to worry about destroying
the objects. This sounds very appealing, so why did I say "most die hard C++
programmers would hate to give in entirely to a garbage collection system?"

The problem is that memory is just one type of resource. There are numerous
other resource types, such as thread or database locks, file handles, Win32 GDI
resources, etc. A GC (garbage collection) system only addresses memory, totally
ignoring the other resource types. In most modern languages that support GC
the programmer is still left to manually manage these other resources. In
contrast, the C++ programmer can easily manage these other resources by wrapping
the resource in an object that uses the RAII (Resource Acquisition Is Initialization)
idiom. When the resource wrapper goes out of scope the destructor will automatically
release the wrapped resource.

At first you might think that adding destructors to objects in a GC langauge
would be enough to allow the best of both worlds. This is almost true. Most
GC languages have no concept of "stack data", so all objects are created on the
heap and are destroyed only when the collector runs. This is non-deterministic,
meaning that you have no idea *when* the collector will run. The result is that
you won't know when the wrapped resource is released, which can be critical for
many resource types. For instance, a thread lock should be held for the absolute
shortest possible time. Hold it longer and you may starve other threads, or even
theoretically cause a deadlock. You simply can't allow the lock to be released
in a non-deterministic manner. So what you really want is a language that supports
GC, stack data and destructors. That's why I said "most die hard C++
programmers would hate to give in entirely to a garbage collecting system."

Smart Pointers

C++ programmers have become accustomed to using the RAII
idiom to manage memory. They wrap a pointer to an object created with operator
new up in a class that emulates a regular pointer by defining certain operators
such as '->' and '*'. These wrappers are
known as "smart pointers" because they emulate a regular pointer while adding
more intelligent behavior. A classic example of such a smart pointer comes with
the standard C++ library and goes by the name of std::auto_ptr.
This smart pointer was specifically designed to insure that memory is freed if
an exception occurs, but it's not really usable for more complex memory
management. With std::auto_ptr there can be only one "owner",
or pointer referring to an object instance. Copying or assigning an
std::auto_ptr transfers ownership, setting the original
std::auto_ptr to NULL in the process. This
prevents the std::auto_ptr from being used in a standard
container such as std::vector, for instance.

Many other smart pointer implementations attempt to allow multiple "owners", or
pointers referring to the same object instance, through reference counting. A
reference count is maintained that's incremented when the smart pointer is copied and
decremented when the smart pointer is destroyed. If the ref-count ever reaches
zero the object is finally destroyed. CodeProject includes a couple of smart
pointer implementations that use this
idea.1,2
There's also a version that's undergone thorough peer review that can be found
in the Boost library.3

Reference counted smart pointers are quite handy and work for many situations,
but they aren't as flexible as true garbage collection. The problem is with
circular references. If object A has a ref-counted pointer that points at object B
which has a ref-counted pointer that points back at A then neither object will
ever be destroyed since their ref-counts will never reach zero. It can be difficult
to prevent such circular references.

One approach to fixing this problem is to use a "weak
pointer". A weak pointer is a pointer that references an object, but does not
increment or decrement the ref-count. A regular C++ pointer is a "weak pointer"
but it has a draw back to use. When the ref-count goes to zero the object is
deleted which may leave regular pointers used as weak pointers "dangling". For
this reason another smart pointer type is often used for a weak pointer. This
other smart pointer can cooperate with the ref-counting smart pointer so that
when the object is deleted the "weak pointer" will be automatically set to
NULL, thus preventing dangling pointers.

This works, but still leaves the programmer with the responsibility of spotting
circular references and correcting them. This is not always easy, or even possible,
to do. So, in some cases the C++ programmer is still left wishing that C++ had
a GC system on top of the other facilities it already has.

Conservative GC Libraries

There are several free and commercial implementations of garbage collectors
written for C/C++ that replace malloc and new respectively. All of these collectors
are known as "conservative" collectors. This means that they err on the side of
caution when trying to determine if there are any remaining pointers referring to
an object. The reason for this is that in C and C++ it's possible to manipulate
a pointer by storing it in other data types, such as unions or integral
types.4 Unfortunately, this means that sometimes
they fail to collect objects that are really no longer in use.

Another problem with such libraries is that they only work on objects
created with special forms of malloc and/or new. Memory that was allocated by third
party libraries, for instance, won't be collected.

gc_ptr

So, is there a better solution for C++? This article and
the accompanying code attempts to define a better solution by marrying the
concept of smart pointers with the concept of traditional garbage collection.
The result is a class template called gc_ptr.

The gc_ptr class addresses the first issue with
traditional GC libraries, namely the ability to be non-conservative. The only
type that needs to be evaluated as a candidate for referring to an object is the
gc_ptr itself. All other types of references to the object are considered to be "weak pointers"
and are not considered. Another benefit to the smart pointer approach here is that the
implementation is made much simpler since the collector doesn't have to try and find references
to the objects, since all such references can be registered with the system at the time of
creation.

The second issue is harder to deal with. The implementation of gc_ptr was originally
based on the implementation for a similar class, circ_ptr, submitted for consideration
to Boost by Thant Tessman.3,5 This original implementation allowed the
circ_ptr to be created with a pointer to any object allocated
by operator new. Unforunately this implementation left a serious hole open for
misuse of the class. The size of the object was registered with the first
creation of a circ_ptr that referred to the object. This size
was used to determine if an object contained other circ_ptrs in
order to find circular references. If a circ_ptr<base>
were to be initialized with a pointer to a derived type because it registered
the size as sizeof(base) the code may fail to recognize that
the object contains another circ_ptr and thus prematurely collect
an object. Using a templatized constructor in the original implementation would have made
it less likely for this to occur since the size could be calculated from the type of pointer passed
to the constructor, but this would not have eliminated the possibility for error as illustrated by the
following code:

To actually eliminate this possibility we need to provide a custom way of allocating objects that
are to be garbage collected. The implementation for gc_ptr does this by
providing an overloaded new operator that takes an instance of
gc_detail::gc_t in addition to the size_t. In this overload
the size of the object allocated is stored to enable proper detection of whether or not
an object contains another gc_ptr. With this overload the above example can be coded
safely as (note the unusual syntax for new):

Objects pointed to by gc_ptr now must be allocated only through the
syntax "new(gc)". The variable "gc
" passed to operator new using this syntax is declared in an unnamed
namespace by the library as a convenience and to prevent violations of the "one definition rule."
Objects allocated through the standard new operator that are used to construct a gc_ptr will throw an immediate
std::invalid_argument
exception. This makes it easy to find such
errors at run time and prevents misuse. Because of this requirement a gc_ptr is not a
direct replacement for regular pointers, but it's still a relatively simple replacement.
All that's needed is for you to change the code that allocates the object.

Interface

The interface for using gc_ptr is relatively simple. There are two global methods
that are used to control garbage collection and the gc_ptr class itself.

void gc_collect()

This is the heart of the collector. When invoked a
mark-and-sweep algorithm is run to find all objects which are currently no
longer in use and then destroys them. The programmer is free to make use of this
method to invoke garbage collection at any point. However, there isn't a
requirement that the programmer ever call this method. The
new(gc) operator will invoke this method automatically if
either a programmer specified threshold (see below) of memory has been allocated
since the last call to gc_collect or if memory has been
exhausted. In most cases this automatic collection should be enough.

void gc_set_threshold(size_t bytes)

This method sets the threshold at which
gc_collect is automatically called by operator
new(gc). Initially the threshold is set to 1024 bytes but may
be changed by this method to optimize performance. There are two interesting
values that may be used in a call to this method. Passing in zero will result in
memory being collected for every invocation of operator
new(gc). This will degrade performance but will insure optimal
use of memory. Passing std::numeric_limits<size_t>::max() will have the opposite effect, turning automatic
collection off entirely, though collect will still be called if memory is exhausted.
This will give the best possible performance but will result in the worst possible use of
memory.

An interesting technique could be applied to create a
"parallel garbage collector" by passing in
std::numeric_limits<size_t>::max() to this method to shut
off automatic collection and then coding a thread that periodically calls
gc_collect itself. This may result in
optimal performance and memory use in multithreaded programs.

gc_ptr::gc_ptr()

Constructs a gc_ptr with a NULL reference.

explicit gc_ptr::gc_ptr(T* p)

Constructs a gc_ptr to point at an object constructed with operator new(gc).

template <typename U> explicit gc_ptr::gc_ptr(const gc_ptr<U>& other)

Constructs a gc_ptr from another gc_ptr. The template definition allows related
types to be copied, though only single inheritance relationships may be used safely
with this implementation. If a solution can be found for multiple inheritance a
future version will do so. For now you should avoid using types that use multiple
inheritance in gc_ptr.

Provides a flexible interface that allows the programmer to control when garbage is to
be collected, including a simple way to implement "parallel garbage collection".

Easy to use.

Thread safe. When used in a multithreaded program calls
to gc_collect, and thus calls to operator
new(gc), may block if another thread is currently collecting garbage. In practice this
shouldn't cause any noticable speed difference, since both operations are (relatively) slow any
way.

Cons

Not as efficient as manual memory management or ref-counted smart pointers.

Requires the use of an overloaded operator
new(gc) for allocation of the objects that are to
be garbage collected. This syntax is non-standard, but the implementation conforms to
the C++ Standard.

Will not work properly with types that use multiple inheritance. This is the most
serious hole in the implementation and simply results in undefined behavior, not in
a compile time error. A solution to this problem is needed so if any programmers
know of one I'd love to hear it.

Pointers to objects returned from operator
new(gc) that are never pointed to by
a gc_ptr will leak. There's no way for the collector to know how to delete such
objects since the object's type is not registered until the first assignment of
a gc_ptr to the object.

Possible Future Enhancements

Some enhancements that may be provided in the next version include the ability
to use types that use multiple inheritance and to include ref-counting.

The multiple inheritance problem will be hard to solve. The garbage collector must save
a void pointer to the object and the gc_ptr uses only this void pointer internally.
Casting the void pointer to the templated pointer type using
static_cast<> is safe for types that
don't use inheritance or that use only single inheritance, but it's not safe for types that
use multiple inheritance. If any programmers know of a solution to this problem I would
love to hear from them.

The ref-counting would help to optimize memory usage by destroying objects that don't participate in
circular references immediately instead of only when gc_collect is called. This shouldn't be too difficult
to add to this implementation and was left out only because such uses should be coded with ref-counted
smart pointers instead of gc_ptr in order to optimize the performance of both. This makes ref-counting
in gc_ptr only a minor benefit.

Special Acknowledgement

The code presented here was based on the implementation of circ_ptr5
written by Thant Tessman and submitted to Boost.3 The implementation
was refactored to minimize compile time dependencies, address the usage issue that led to new(gc), and to
provide a more efficient collection algorithm. I'm indebted to Thant not only for the original circ_ptr but
for input during the refactoring for my implementation. The code presented here is original, but would not
exist with out the original efforts and later input given by him.

Footnotes

3Boost, an organization of programmers working to produce
quality libraries that undergo extensive peer review to the public domain. Many of these
libraries are hoped to be considered for inclusion in later C++ standards.

4 The standard doesn't
gaurantee which, if
any, integral type a pointer can be placed into, but in practice most platforms allow this, and
it's a common technique used.

5circ_ptr.zip. This
link requires you to be a member of the Boost eGroup mailing list. See the
Boost3 web site for instructions on joining.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Achilleas didn't you write that other GC library posted here on CodeProject a few years ago? If you're knowledgeable I'd urge you to develop this library into something useable. I'd like to use it but it's no use if it's dead-drop slow.

Some more info, As far as I remembered Boehm is conservative.
I have red your article in parts, so I have some doubts?
¿Could the collect ONLY be done on my sing?
¿Will it have some memory leak?
I am trying to create a c++ framework/subset/implementation pattern/with gc for use in a video game production with good productivity (quality that normal c/c++ does not have at all).
So I would like to collect only in some "loading times" or "low process" time.
And the second question of course is because the hi create/destroy frequency in video games.
Thanks in advance.

First, this is a "proof of concept" intended for learning, not a production worthy solution. This solution isn't optimized. I covered most of the pros/cons and comparisons in this article and the follow up to it. I'd say for your particular case, this is probably not the solution you want. It might be a starting point, but there's certainly work to be done with it. Boehm may be a better choice, though there's issues there as well. More likely, a more traditional ref-counted smart pointer may be enough for your purposes. In less you know there's going to be cycles that you can't break, the ref-counted smart pointer is going to be the most efficient and simplest solution.

One of the problems with GC is destructor call order. For example if A created B and holds pointer to it, and than passes itself to B so B holds pointer to A as well. If one of these two guys suppose than the other will be still alive during their destruction it might happen that A can use B (or vice versa) in its destructor and if GC destroys B before A it might cause problem.
Do you handle destructor call order and how ?

No, I don't handle it at all, but in practice it's not likely to be a problem for *most* programs. It's a rare object that would make use of another object on the GC heap (my terms) within its destructor. In general, to do so would be a violation of good design.

You are correct to point out this danger when both GC and destructors exist in a system, however. If this smart pointer were to be added to the standard there'd be some sort of disclaimer indicating that this results in undefined behavior.

In my limited development experience so far, smart pointers have been the
best thing that's happened to my code.

As far as the circular reference problem is concerned, however I have always
found that the argument is solved by considering which object is responsible
for the other object's lifetime, ie. which object "owns" the other.

In the case where A aggregates B via a smart pointer, and B requires a reference
back to A, B must use a "dumb" pointer. A controls B's lifetime, yet B can navigate
back to A. If B must outlive A for some reason, A must transfer "ownership" to another
object to keep B alive, and B's reference to A must be also adjusted to the new owner
as part of the transfer of "ownership".

In a doubly linked list scenario one direction could be the ownership-direction
and the other direction a mere "dumb" pointer reference. Establishing an ownership pattern
and ensuring proper semantics for ownership tranfer avoids the problem of garbage.

By considering ownership, and establishing an "ownership-direction" you avoid these
problems without having to collect garbage. I've found that the "ownership" idea
while being simple, usually helps sort out all of these problems.

This does rule out truly "circular" collections, but does not rule out collections that
can be iterated in a circular fashion. For example, an adapter can be written so that
an STL list or vector can be iterated in a circular fashion. While not ultra-high
performance, it does improve reliability.

In general, judicious use of smart pointers and ref-counted pointers have addressed
all of my needs.

Of course there is bound to be something I have not come across yet, but would be an
interesting challenge to smart pointer addicts everywhere.

Your "dumb" pointer approach was documented here as a "weak" pointer. In a lot of cases this approach will resolve the issue of circular references when using ref-counted smart pointers. However, there are two major problems here. First, it's not always possible to assign this sort of strict ownership, or at least not with out compromising a simple design with a much more complex one. Second, even when you can assign such strict ownership it will often require a lot more programmer discipline which can lead to serious problems during maintainance. Are you sure that another programmer who's assigned to maintain your code is going to understand the strict ownership rules you've set up?

I've lived through a lot of years of programming in C++. In the beginning we managed memory manually, and despite the doom sayers that worship GC languages would like you to believe, this didn't cause me too many problems. However, when it did the problems were hard to correct, and I spent way too much time dealing with this issue. Then along came ref-counted smart pointers and a lot of my programming became easier. However, I learned several things about them. First, they have their own sets of problems. People such as Scott Meyers have done a much better job explaining the subtle issues than I could here, so I'll just direct you to research the subject yourself. Second, I learned that there were cases in which I still spent way to much time dealing with memory management because it was impossible to deal with directly and the ref-counted smart pointers required very careful use and redesign of my interfaces. The GC smart pointer will still have some of the issues of other smart pointers, but at least in those cases where I'd spent too much time working around the deficiencies in the ref-counted pointers I'll now be able to go with the more elegant design and spend less time with the details. However, as pointed out in the second part of this article, I don't plan to either give up on ref-counted smart pointers, or even on manual memory management. Each is best for different cases. The second article tries to detail when to choose which. I'm just glad I've got the options.

This is the approach we utilized over 8 years ago when developing a transaction processing
framework used in a large-scale manufacturing execution system (MES) for the semiconductor
industry. Our systems require highly complex hierarchies of objects which can easily have recursive
(circular) relationships. As you have probably determined, this is not a trivial problem to solve,
especially if you don't know in advance that the pointer you are attaching to a network is being
recursively accessed. That is where the biggest piece of work, and biggest impact upon runtime
performance, occurs. If you do know in advance that the pointer needs to be "dumb", as you
call it, the problem is simple. If you do not, then you must recurse the structure in order to determine
if it needs to be made such.

Firstly, I did not use the term "dumb", I used the term "weak". It was the original poster who used "dumb". In any event, neither term is perfectly accurate. What's really being used here are built in pointer types.

Secondly, I'm not sure I know what you're trying to say here. The "recursing" of the structure to "determine if needs to be made dumb" is a cost of programming time, not a cost at run time. It's this cost that GC addresses. So when you talk about "that is where the biggest piece of work, and biggest impact upon runtime performance, occurs", I don't know which form of memory management you think causes a runtime performance cost here. It appears that you mean manual management has a cost here, and this is not precisely true. If, on the other hand, you mean that GC has a cost here, this also is not precisely true. In fact, there's evidence that GC systems often perform faster than non-GC systems in this regard.

In the end, no, there is no explicit need for GC. Everything that you can do under a GC system you can also do through manual memory management, or with simple ref-counted smart pointers. However, doing so generally just leads to "reinventing the wheel", since the result is a form of GC hard coded to a system, and as pointed out in my previous reply, often results in complex or poor designs. (Complex and poor in terms of programmer time required to create, maintain and expand the system.)

Hello.
Would you like to show it on an example in c++?
I have implemented smart pointers with reference counting but I also need to handle the circular reference so I have no idea how to do that.
I have read lots of articles where is proposed an implementation of graph but no idea how to start with that.

I've got a new implementation ready that addresses the problems with polymorphic types. I'm looking for people interested in helping me test this code before I publish a second article on this subject. If you're interested send me an e-mail at williamkempf@hotmail.com. I need people who can evaluate the code for design errors and bugs and who can put the gc_ptr through some rigorous testing.

I believe dynamic_cast will solve your problems with multiple inheritance. dynamic_cast returns a pointer to the beginning of the memory occupied by the object, which is what you are looking for. The only problem is that it requires the class to have at least 1 virtual function. In practice, this should not be a problem since virtually all classes that would need garbage collection have at least a virtual destructor.

Actually, dynamic_cast won't work. From the standard, 5.2.7.2, the operand must be a pointer to a complete class type, and void* is not a complete type. In VC++ you can use a hack where you combine both static_cast<> and dynamic_cast<> like this:

dynamic_cast(static_cast(pv));

Provided T is a polymorphic type this will work, but the standard doesn't gaurantee this and I wouldn't want to make use of it in real code. Further, I don't agree that most types that you'd want to garbage collect will be polymorphic types. I think it would be a worse implementation that forces polymorphic types than the current implementation that doesn't work with types that use multiple inheritance.

If dynamic_cast solves the problem, then I think you should use it. I mean, the 4-byte overhead for a virtual function table is negligible compared to the benefits of garbage collection. Maybe offer two versions of the garbage collector, one using dynamic_cast and the other not?

Too many times I've witnessed programmers worry about the tiny internal details, like this will cost 4 bytes extra in some objects, instead of getting a reliable product developed and to market. Heap allocators waste more memory than 4 bytes on every block allocated.

Removing the worry of memory management for the majority of objects is a great benefit, and you shouldn't lose sight of that! Any other memory allocation can be wrapped into a garbage-collected object which will eventually be collected and the non-GC memory released. Then memory leaks can be consigned to history.

Most types, in my experience, are not polymorphic types. Restricting usability of gc_ptr only to polymorphic types will therefore result in a very large class of types that can't be used. I won't accept a cost like this. The issue isn't "a tiny internal detail... like this will cost 4 byes extra in some objects", its that using dynamic_cast<> fully excludes the majority of types from being used. After all, if I were that concerned with overhead I'd not be looking at GC in the first place. Just look at how much overhead there is in the tracking of nodes and pointers.

Not to worry though... the main reason for posting this article was to get feedback on this issue and to get some (hopefully quick) real world testing. I've gotten enough feedback that the solution has dawned on me (though I'm not sure why, since no one really gave the solution in their comments). I'll have something available soon that won't have the problems that this implementation does. So if the problem is of immediate concern, just hold on for a bit. Otherwise make use of the code as is (the interface won't change).

I still disagree William. You say you are not concerned with overhead, hence the reason you are investigating a GC solution. Yet the 4-byte polymorhphic overhead, for some objects, is too much overhead for your personal taste? Mmmm.

Please do not take this as criticism. The collectoryou have come-up with so far is excellent and goes a long way towards usable GC in C++. I would be very interested to hear a brief outline of your solution to the multi-inheritance problem though, without using any kind of type identification. I thought of a solution involving replacing the new/delete operators in each class (can be done with a #define macro), but I am stumped as to how to make it simpler than that.

Please let me in on the secret, the solution you've found, so I can sleep again and stop thinking about this blessed problem!

No, I am NOT concerned with overhead. An extra four bytes for "polymorphic overhead" doesn't concern me. What concerns me is reversing the problem... from "can't accept polymorphic types" to "can only accept polymorphic types." As I stated, in my experience most types are not polymorphic, so this reversal of the problem is actually worse, IMHO.