Introduction

This article deals with refactoring the code originally presented in
A garbage collection framework for C++
in order to allow polymorphic types to be used. If you've not already done so you should
read that article first. This article mostly deals with the process in which the code
was refactored to illustrate to new programmers how code is maintained in real life situations.
If all that you're interested in is using the gc_ptr<> smart pointer in your code you do not
need to read this article. Just download the code from the link above to replace the implementation
found in the previous article. You may still find a few sections in here worth reading, specifically
the section on compiler bugs and on recommended usage.

The Problem

If you've read the original article you'll remember that although I provided a smart pointer
that would give you true garbage collection in C++ the implementation had a serious problem. As
coded it would not handle polymorphic types. To illustrate the problem you'll have to understand
something about how pointers behave when cast to a base type. The following code will help
to illustrate the problem and will be used in the rest of my description here.

What will surprise many with the example above is that p1' and p2 will not point at the same
address in memory. This means the assert above will fail. The original implementation relied
on casting to "a pointer to cvvoid" returning the same value as the start of the
memory block allocated by new(gc). Obviously this wouldn't always be the case for polymorphic
types.

Looking for a Solution

While trying to think of a solution for this problem it occured to several people that
the following code would work.

The reason this works is that dynamic_cast<void*>(p) will return the address of the most derived type of
'p' (ISO/IEC 14822, section 5.2.7/7). At first this seems to be the perfect solution
to our problem. However, dynamic_cast<> can't be used on non-polymorphic types. Modifying
gc_ptr<> to use dynamic_cast<> instead of static_cast<> would allow
polymorphic types to be used, but would then no longer allow non-polymorphic types. In
my experience most types are non-polymorphic, so I could not live with this exchange of
problems.

This still seemed to be the obvious solution, however, so I set off to find a way
to cast a fully typed pointer to "a pointer to cvvoid" referencing the most
derived type regardless of whether or not the original type were polymorphic. One of the
first things I did was to pose this as a question on comp.lang.c++. I made a mistake in
wording my question, however. Instead of saying "to the address of the most derived type"
I said "to the beginning of the object." Someone quickly pointed out that the standard doesn't
dictate how an object is to be layed out and that padding may actually exist in the object before
the address of any pointer to the object. I already knew this, so being reminded of it
shouldn't have been of any help. Surprisingly, however, it was.

You see, I'd been to close to the problem. I'd been grappling with the complexities
of mark-and-sweep, calculating algorithm complexities, addressing corner cases in the
language such as the lifetime of global data, etc. So I had blinders on. I focused on
the "obvious" solution with out considering other possibilities. The little reminder
about object layout concepts for some reason sparked my thinking enough to break out
of this narrow view and the solution dawned on me.

What's really surprising is how simple and obvious the real solution was. In fact, when
I told the solution to Thant Tessman, the author of a template class called circ_ptr on which
I based my original code, he responded with "Duh! I'm embarrassed I didn't think of it." I
know how he felt. After all, if we go back to the origins of this code, specifically to
traditional garbage collection libraries, we'd find that they use the solution themselves.

So, what's the solution? Stop tracking only pointers to the beginning of the allocated
memory block and start tracking pointers that reference any memory within the entire block!
Pointers cast to base types will still point within this block. So modifying the code
that looks up the "node" in the implementation to search all known nodes until we find one
that "contains" the address pointed to takes care of most of this problem. As an added
bonus we can now have gc_ptr<>'s that point at members of objects allocated with new(gc).

I said that modifying the code that searches for registered nodes took care of most
of the problem. What's missing? There are two things missing at this point. The first is
the easiest to solve. The gc_ptr<>::get() method casts back from "a pointer to cvvoid
" that references the base of the allocated memory block back to the actual type. This has the
same problem as the cast to "a pointer to cvvoid", and the solution turns out to
be even easier. I simply added a real pointer to gc_ptr<> that's set to the pointer value
passed in to the constructor and modified the appropriate assignments to handle this as well.

The second problem left to be addressed is deletion of the object once there's no longer
any "root pointers" to it. Again, the implementation relied on casting the base pointer back
to a real type in order to call operator delete on it. The solution was to add a second pointer
to the node's state. This pointer will point at the object that was used to "register" the
destructor with the node (for details on this please see the implementation... this bit
is next to impossible for me to explain in text here).

With these changes made we've now got an implementation that handles polymorphic types!

Speed Issues

The first time I modified the code to allow polymorphic types everything worked with out a
hitch. However, the code used to find the nodes was not efficient. So another round of refactoring
ensued. Several things were moved from the header file to the implementation file instead to
reduce compile times for clients, since there was no longer a need for them outside of the
implementation. Several functions were broken out into multiple functions. The containers
used to track pointers and nodes were changed. And finally, code that searched for nodes was
broken out into a single function that returned an iterator.

I'm not sure that I've got the fastest possible implementation for find_node, the function
that searches for a node that contains a given address. The first implementation simply
stepped through all of the nodes until one was found that contained the address. The final
implementation only steps through until either a node is found or the address is greater
then the current node's base pointer. This relies on the fact that std::set<> is a
sorted container. It would be nice if we could use the set's built in searching capabilities to
find our node, but all of them (find, lower_bound, upper_bound and equal_range) use an exact
search based on the set's predicate template parameter. So if the address isn't exactly
equal to the node's base pointer then all of them will simply return an iterator to the
container's end(). The standard algorithms aren't much help either. None of them will
be any more efficient then the hand coded for loop, and in fact are likely to be worse. The
std::binary_search algorithm won't even help us, because even though the container is sorted,
it doesn't support random access iterators.

Compiler Bugs

This section of the article will be a bit controversial. There are many people who
think that VC++ has adequate support of the standard and aren't too concerned with the fact
that VC++ 7 won't be fully compliant. Well, gc_ptr<> illustrates nicely some areas in which
VC++ 6 isn't compliant enough to handle simple code. In this case, I expect VC++ 7 will
likely handle the code much better, but this will still illustrate why compliance isn't
just a nice feature, but something we should demand. I know that the team in charge
of the VC++ compiler is concerned with this and that they are working hard on it, so I won't
harp on this subject too much. However, because it's impossible to completely work around
the bugs, at least to my satisfaction, I must tell you about the problems I encountered.

After fixing the polymorphic type problem I could start to address the interface in
more detail than I had with the first version. I wanted gc_ptr<> to follow the same
general style as std::auto_ptr<>, both for familiarity for users as well as to insure
I didn't revisit mistakes that the standards comittee addressed along the way. One of the
things that was changed was to provide two different constructors and assignment operators
for use with other gc_ptr<> instances. In my first implementation there was only a
templated form that allowed conversions from one gc_ptr<> type to another. The standard
contains versions for assignment and construction from identical gc_ptr<> types as well.
I'm not sure exactly why this was necessary, but I'm not going to second guess them. I
added the non-templated versions for identical types. Surprisingly, the compiler complained
about duplicate definitions. Turns out that VC++ can only handle this if the templated
forms come first, which is a parsing bug since the standard does not require this.

Another change was to remove assignment to a "real pointer" and to add a reset() method
instead. Assignment to a "real pointer" can result in some subtle and unexpected results in
some cases, so it should not be allowed. After making this change I recompiled the code
and ran the test harness. Surprisingly an object was being collected prematurely. Or at
least that's what appeared to be happening. Stepping through the code revealed that what
was really happening was much worse. I had failed to correct a line in the test harness
that assigned a gc_ptr<> to a "real pointer". The compiler should have caught this and
given me an error at compile time, but it did not. Instead, it produced nothing for this
command (the line was skipped while stepping through in the debugger) and continued on
as if nothing had happened. I got no error or warning, but instead got an executable that
behaved differently from what was expected. I tried to distill this into a simpler
example to send to someone in e-mail and found that in the simpler form I received the dreaded
INTERNAL COMPILER ERROR. The test harness included in the link above includes the
code needed to reproduce both compiler behaviors, as well as a "fix" that if you
absolutely must will prevent this on VC++ 6. I don't like the fix because it produces
a strange compiler error that will confuse anyone who doesn't know why the fix is there.
Since I expect VC++ 7 will fix this, I've left the fix commented out. If you want it
for VC++ 6, uncomment it.

Usage Notes

Now that I've got an implementation that's fully functional I think it's appropriate
to address when you should use this smart pointer. It's not a good drop in replacement
for all pointers, nor even a good replacement for ref-counted smart pointers in all situations.
First, if you want to use garbage collection exclusively, you're probably using the wrong
language. There is no language support for garbage collection, so all available sollutions
have the problem that they don't work with memory that's allocated by other libraries,
including the standard library.

If you don't need to use it everywhere, but would prefer
to use it most of the time, it may be more appropriate for you to use a
traditional C/C++ garbage collection library. They totally replace the allocation logic instead
of building on top of the normal routines. This allows them some memory and speed optimizations
that are impossible here (or at least undesirable). However, remember the drawbacks discussed
with this solution in the previous article. The conservative nature of these libraries may be
undesirable despite the benefits in speed and memory use.

Even after you decide that using it only for cases where manual memory management is too
difficult and error prone you may want to opt for a more traditional ref-counted smart pointer
instead. There is some definate overhead involved with this implementation. There's extra memory
that's going to be allocated for every object you allocate and every pointer that exists. The
allocation will be slower and calls to collect may cause some definate speed problems if not
managed carefully. In contrast, a ref-counted pointer doesn't use much more memory than a standard
pointer and will operate nearly as efficiently. For this reason, if speed and/or memory is important
to you then the gc_ptr<> class should be used sparingly. Only use it in places where you know
there will be circular references, or where you fear there might be. When you know there won't
be any circular references, use the ref-counted pointer instead.

Now that I've talked about the inefficiencies of gc_ptr<> I should point out that every
effort was made to insure the best possible performance. When used sparingly I'd expect
that you'll not notice any problems in this area.

I'd love to hear of real world experiences that you have using this class. If you've got the time,
post it in the comments below so that others will know what your experiences are.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

I see that you use windows.h header for the "CriticalSection" functions, this makes the library impossible to use on other platform than Windows... Do you plan to implent your own "CriticalSection" functions ?

There is 1 serious performance limiter in the code presented here. mark function looks through all the registered pointers while in typical case only few are inside the memory range controlled by the node being marked. This is a seriuos waste, and slows mark collection phase to unacceptably slow rate. But it's in fact quite simple to change it to look only for probable candidates.Just realise that ptr_map is a sorted collection and node->begin tell's us what to look for! Here comes the new mark function code:

Instead of checking everything, code looks forpointer equal (or preceding -- optimal would be too look for equal or following but there is no such method in map template) to node->begin.If the pointer is not equal but is a lower address then it won't be covered by the node. But if there is anything following it it either is covered by the node's area or is above that.So then code just looks for pointers covered by the node area (from node->begin to but excluding node->begin+size) and stops any further search when it finds first pointer above that area (>= node->begin+size) -- as ptr_map is sorted and there is no possibility to find yet athother fitting pointer further down the sequence.And thast the whole trick.

Other minor thing is that threshold of 1KB is vastly too small for any reasonable load (i.e. when performace matters) and in todays times when typical computer has >128MB of RAM it makes little sense to set it so low. My first shot at 256KB (262144) give me a speedup of factor of 6 in a code making heavy use of gc_ptrs.

My point of view is we don't need it, you put back what you took is a true discipline in life as well as in software engineering. Instead, a HIGH PERFORMANCE BUFFERING MECHANISM would be more valueable then garbage collection, it can be truely cross platform as well.

Well C++ is one of the last general purpose industrial languages which lacks GC. GC simplifies dataflow and allows for shorter and in fact more effective programs. Look at STL -- it practically requires copying of potentially large objects everywhere. This is vast ineffectiveness. The other option is to explicitly free objects -- this complicates program logic, makes programs larger and is a potential place of hard to hunt errors (memory leaks are domain of C & C++ & Objective Pascal applications). Besides, studies have shown that number of programing errors per code line is mostly independent of the language (among high level languages), but C++ programs are typically about twice as long as in most other languages.

Back to your point -- making high performance buffering mechanism is easy with GC and hard and eror prone without it (as it requires transfer ownership and thus stric policy of who & when is allowed to destroy objects). Especially in multithreaded apss it becomes allmost infeasible.

I agree that returning what you've possesed is good discipline in programming and even more in life. And buffering of GS is just one additional feature almost "for free" along with general functionlity, which is to guarantee durability of application.
Just think of a server working constantly, with as few as possible restarts. On a complex software application like applicaton server,for example - can you guarantee 100% that there will be absolutely no memory leaks?

I saw your confusion in the article. I'd recently read Jusittus & Vandervoore's _C++_Templates_, where he happens to mention the reason.

The standard declares that a compiler should prefer an autogenerated function to a template instantiation. There are 4 functions that the compiler should automatically generate if you don't supply them: default & copy c'tors, d'tor, and copy assignment. Thus, if given the choice, the compiler should choose not to instantiate your templated version of any of these 4 functions, and instead automatically generate a trivial version.

This is not usually a problem with default c'tor or d'tor, as they are not generally expected to be generated from a member template. However, if you write a templated copy c'tor or assignment operator, then you have not yet overriden the trivial versions that the compiler will generate. Thus, you must explicitly overwrite them as well, using a non-templated version.

I really like the idea, and would like to test it out. The one thing I'm wondering is if there is support of any kind for downcasting.

Class A{...}

Class B: Public A{...}

gc_ptr<A> test = new(gc) B; //should be fine...

gc_ptr<B> test2;

Now I want to assign test to test2. How do I do this with 'gc_ptr's? One smart pointer implementation I saw let any smart pointer be copied to any other smart pointer(by plain C-style cast to T*), resulting in possible catastrophy with no compiler warnings. I can't see how this implementation would deal with this.

I guess I better go and try it out first before I assume there is a problem though. Just thoughts.

One drawback with this solution is that new and delete oparators of the class have no chance to be called. Moreover if class have delete and new declared there will be syntax error. Do you have any suggestion what would be the most elegant solution for this case ?

Also do you plan to add reference counting ? It should be quite easy to add and overhead would be small just few bytes more for counter and would immediatelly collect all objects that are not participating in any circle (which is true in most for most of objects).

Assuming you are using a comparison function that is x->base < y->base,
Here is how you can find the node containing containing the address void* ptr

node_t temp(ptr,0);
node_set::iterator i(data().nodes().lower_bound(&temp)), end(data().nodes().end()), begin(data().nodes().begin());
// i now points to a node that is >= temp otherwise end

// if it is not end, and it contains the ptr, we are done
if(i != end && i->contains(ptr)) return i;

// Otherwise it is greater than ptr, so we need to back up 1 and check
// NOTE: we have to make sure it is not begin or otherwise, we will have an invalid iterator
if(i != begin && (--i)->contains(ptr)) return i;
return end;

The above code should be faster than a linear search, and it handles pointers that are not exactly equal to base

Yes, I have egg on my face on this one. I tried the above and found that it didn't work (obviously I made some serious error in my attempt) which lead me to read the documentation of lower_bound where I managed to misinterpret what the documenation said. So I was totally wrong on this subject. However, if you look at the implementation file which was updated on Jan 29th here (or should have been... I'll have to double check that) you'll notice that I'd already fixed this error in the code, if not in the article.

1) I need to create a global variable (yes, I know they're evil, but it's just for a test program, and I don't want to write all sorts of nonsense to have a global wrapper). Is it possible using this library? If I have:
gc_prt kernel;
int WinMain()
{
kernel = new(gc) Kernel;

inside my WinMain(...), I get an unhandled exception pointing at Kernel::Initialize(). However, if I have:
gc_prt kernel = new(gc) Kernel;
global, it all executes before I start my program. Any ideas???

2) When I have it running as a global, the program starts and runs ok (It's a framework for 3D graphics under DirectGraphics and OpenGL). I've only written the OpenGL drivers so far, but when I call wglDeleteContext(HGLRC hRC); I trigger a "User Breakpoint" in NTDLL, "HEAP[k2.exe]: Invalid Address specified to RtlFreeHeap( 130000, 194ac0 )". It appears that the overridden 'delete' is interfering with the standard 'delete'. Step out a few times, and the program completes. I'm stumped!!!

Apart from this, it's a nice library (albeit the code is a little messy ).

I took great pains to insure that global instances will work correctly. If you've got code where it doesn't I'd really like to have you break it down into a simple program that illustrates the error so that I can fix it.

As for "It appears that the overridden 'delete' is interfering with the standard 'delete'"... that's simply not possible. The "placement delete" operator should only be called if an exception is thrown while constructing an object created with the counterpart "placement new". In fact, if you try and call the "placement delete" operator you'll find that you can't.

That said, I must admit that I don't follow your description well enough to help here. You shouldn't be calling delete on garbage collected objects, so you shouldn't have thought there was interference between the two. So, again, I'd like a simple example program so that I can debug the problem.

I found out that for some reason, my OpenGL wrapper couldn't initialize properly with the system in place (I was also using Kurt Miller's memory system [Available at www.flipcode.com], which was entangled in the wrapper). So, as a result, I threw out the whole lotand started again from scratch. It all appears to be working properly, but I am yet to compare the speed both with and without the GC system.

What if an object is been constructed, and the constructor creates a new gc object itself, but the new operator invokes garbage collection? Won't the first object, still been constructed and not yet assigned to a gc_ptr<>, be collected ?

You are quite correct, this does cause premature collection and an eventual crash. I've fixed the problem as well as made a speed enhancement to "find_node". Later today I'll update this site with the new code.

It's still a problem when running gc_collect in a separate thread. Also, by preventing gc only for the one which is currently constructing is not enough - what happens when the object being constructed contains several other gc objects?