Foreword

Language May Offend: This article contains language that may be offensive to some people, in particular, to programmers that strongly prefer languages that do not expose pointers. Reader's discretion is advised.

Introduction

You're at work. Work, for you, includes maintenance of a legacy behemoth with decades of accumulation of global variables, C-style strings, C-style arrays, do-it-yourself containers, functions with obscure names, and many other things that were written before anyone thought of 'anti patterns'.

The phone rings. You answer. You know that voice.

- "It sometimes crashes."

You know what "it" means: a DLL deep in the heart of the product. The one that implements the core business logic. You know where it crashes: at the newest machine of your company's most valued customer.

And you know what "sometimes" means, but you need a few seconds to evaluate a course of action.

- "Please define 'sometimes'.", you say.

Nine times out of ten, when you're told that a program 'sometimes crashes', those crashes are random, and apparently they do not depend on a particular user's action. This one is no exception.

The caller is one of your top customer support people. You both have already been through this conversation: this program works fine in thousands of installations, but some machines seem to suffer from some kind of allergy to it, and in those few machines the program will crash three times out of seven runs. Customers, who have paid for the product, get angry.

You make a mental list of your options:

Have the customer call an exorcist. Since computers are
deterministic, and the program crashes are random, their cause
is obviously supernatural: maybe the server room is built on top of an ancient
cemetery, or something like that. That may be fine, if you don't mind
people thinking that you believe in ghosts.

Blame a virus. You will look like a clown when the
customer notices that the alleged 'virus' only seems to affect one
program.

Recommend upgrading the hardware. The program works fine
in almost all the other places where it is installed, so there must be a
hardware contribution to the crashes. Since hardware costs money,
business customers will have to go through some kind of approval
process: this would send the ball to the customer's field. Some people
actually see this as a good thing.

Roll up your sleeves and actually try to solve the problem.

If I was in your shoes, I'd rule the first three options out. I wouldn't be a programmer if I didn't prefer solving problems by fixing whatever is wrong with the code: and if a program crashes, there's always something wrong with the code. I'd start by implementing a structured error translator that writes relevant information to the Application Event Log, and by reviewing the code.

Let's assume your choice is the same as mine.

Random crashes are usually the result of resource management issues, among others:

A memory allocation (malloc, calloc, strdup, and realloc) returns NULL, and this return value is ignored.

A resource allocation (fopen, or CreateFile, for instance) returns NULL (or INVALID_HANDLE_VALUE), and this return value is ignored.

A race condition: one thread is trying to use a resource that was released by another thread.

Background

Dangling pointers are one aspect of a bigger subject: resource ownership.When the ownership rules are clear, dangling pointers are scarce.When you have legacy, C-style code, sprinkled with global variables, and rich in stack allocations, the ownership rules are seldom (if ever) clear.

You may expect this code to crash one time in a hundred. In that case, you would be wrong. Run the attached project (compile it in release mode) and see.

The simplest fix for code like this: use a stack variable (see below). Ownership is clear: when the stack unrolls, and only when the stack unrolls, the destructor for the stack variable mClass will be called.

Sometimes you cannot use a stack variable. For instance, when you use a factory method, that returns a pointer to something that implements an interface (inherits an abstract class, actually).

In this case, the simple fix is to use a smart pointer (like std::auto_ptr, boost::shared_ptr, the C++ 11 std::unique_ptr, and others). Choose your smart pointer carefully: std::unique_ptr is the cheapest and fastest, if you can use it.

Dangling pointers are the result of explicit heap memory allocation. Stay away from explicit memory allocation, and you're safe.

There are three ways I know of to steer clear from explicit heap memory allocation:

Use the stack.

Use RAII (like auto_ptr, shared_ptr, unique_ptr, and all other smart pointers).

Use a language that manages memory for you (Java, C#, VB, COBOL).

If you can't choose your language, that makes two.

- So, we're done, ain't we? Can we have our cofee break now?

- Well, if you are thirsty, by all means do; but we're not done yet. Not by a long shot: Googling for 'dangling pointer' brings about 1,340,000 results; Bing found 298,000 results. That's a lot of results for a non-issue...

The real mess begins when functions start passing pointers around, and some function that received a pointer as a parameter deletes it. const doesn't help us there, and in legacy code you will find sometimes beautiful, brilliant inventions - and other times horrible, terrible things that you can only hope did not go unpunished.

A few cases not so simple

Managing resource ownership, in short, is making clear which part of a program is in charge of allocating resources, and which part is in charge of releasing them.The bigger your program, the messier this may get: as usual, we'll learn from the experience of others, and use the patterns that other people have found.
There are several patterns of ownership, the ones I've met most frequently are:

Single owner: the same part of the program (class, function) that allocated a resource is in charge of releasing it. RAII is a good example: allocate on initialization, release in the destructor; also the const std::auto_ptr idiom, that Sutter refers to.

The producer-consumer (or source-sink) pattern: one part of the program allocates a resource, another releases it. I've used this pattern frequently in multi-threaded programming: one thread allocates a new Job instance, then adds it to a queue; a worker thread pops the Job from the queue, calls pJob->Execute() and deletes the Job pointer.

Shared resources (like shared_ptr, or COM AddRef()/Release()) have multiple owners: the last one to release the resource actually deletes the pointer, closes the handle, etc. There is a danger of leaks if there are circular references (a -> b -> c -> a), which usually imply that the pointers will never be released.

In real life, I seldom use explicit memory allocation. I use STL containers and smart pointers, that manage the nasty little details for me. The uses of new() that I couldn't avoid are the following:

The producer-consumer pattern, mentioned above: the producer either is or uses a factory, that will call new.

A legacy API requires a char *, and the buffer size is not known until run time.

Run time polymorphism, which often requires choosing the implementation of an interface at run time.

Let's see some examples.

Calling a legacy API

ShFileOperation (for instance) receives a zero-delimited list of strings in its parameter pFrom, with a double-zero to end the list.
The problem is: allocation and deallocation of that parameter.

When confronted with something this ugly, we programmers don't sweep it under the rug: we encapsulate it in a class.

In this case, there is a reasonably easy solution: encapsulate the call to the legacy API in a class that manages the buffer as an instance member. The member may be an auto_ptr; the class may even expose a method that receives a std::list<std::string> and does all the allocation and resource management inside it.

The producer-consumer pattern

The source/producer is usually a function that allocates a pointer, passes it to another function, and exits without deleting it. This is not a problem until, during maintenance, someone notices that the original author 'forgot a delete', and 'fixes that leak'.
Fortunately, deleting twice the same pointer will raise an immediate exception, so this particular error can be caught in tests.
On the other hand, avoiding is better than fixing: let's see if we find a solution.

Job *pShootAndForget = new Job(params);
m_workerThread.AddToQueue(pShootAndForget);
// DO NOT DELETE pShootAndForget! Will be deleted by m_workerThread whenever it's done with it!
pShootAndForget = NULL; // DO NOT DELETE pShootAndForget! Will be deleted by m_workerThread whenever it's done with it!// DO NOT DELETE pShootAndForget! Will be deleted by m_workerThread whenever it's done with it!

Elegant and refined, right?

Approach 1.2: Don't declare a temporary variable, do it inline.

// The pointer will be deleted by m_workerThread whenever it's done with it!
m_workerThread.AddToQueue(new Job(params));

The risk here, again, is that someone may find a 'leak', and 'fix' it.

Approach 2: Move the burden of creation to m_worker. Ugly.

m_worker.AddToQueue(params);

I don't like this approach, for several reasons:

It breaks the single responsibility principle (m_workerThread is now both a factory and a consumer of jobs); formerly, it was in charge of consuming (executing and deleting) them only.

Adds coupling: In the original version, Job could be any abstract class with a virtual Execute() method, and m_worker 'knew' nothing about its creation. You could use the same worker class with any number of different clients, without changing one line of code.

Approach 3: Use std::auto_ptr, it yields ownership.

If your development environment supports unique_ptr (C++ 11) you can use unique_ptr instead of auto_ptr.

Those of us still using compilers that do not support C++ 11, in business environments where you're not allowed to use the Boost libraries, will have no option but to stick with auto_ptr.

// In WorkerThread.hclass WorkerThread
{
// ...void AddToQueue(std::auto_ptr<job> pJob)
{
// Add the job to the actual queue. Ownership is passed to the queue.
m_queue.push(pJob.release());
}
// ...
};
// In the client CPP file
std::auto_ptr<job> pShootAndForget ( new Job(params) );
m_workerThread.AddToQueue(pShootAndForget); // m_worker takes ownership of the auto_ptr.// The code 'looks' right: if there is an auto_ptr, there is no leak to 'fix'.

This approach is not foolproof, but it doesn't make tripping too easy.

Choosing an implementation at run time

Enter run-time polymorphism.
Let's say you have a program that connects to a database, and let's say your program may use SQL Server, Oracle, or MySQL, depending on configuration: you implemented a data access layer that connects to the right database, generates and executes all the SQL required, and stores and retrieves objects, leaving client code oblivious of all the nasty database details.You did so by defining a data access interface, and implementing a different concrete class for each database engine.You also implemented a factory class, with a static method that returns an instance of the right implementation of your interface.
Hey, you did a very nice job there! Let's see some pseudo-code.

At the end, it generates a fault - and offers the 'Please tell Microsoft' dialog.

Look at the fifth line of the output: X::Foo(00000000);
We actually see a call to (NULL)->Foo(). And it seems to work (at least, as long as you don't use the 'this' pointer inside Foo()). Does this make any sense? Actually, yes.
This is how C++ works: a member function is actually a regular, global, C function that receives a pointer to 'this' as its first parameter.

How to make a mess with stack variables

Now, for another riddle: what do you think this will do?

void Producer3()
{
X x;
Consumer(&x); // Tries to delete a pointer to a stack variable. Will this compile?
}

It compiles fine. You won't know there is a problem until run time.
In debug mode (VS2008), you'll get an assertion, with this content:

In release mode, it runs to the end.

What about Linux?

In Linux, it's about the same. Compiling the code in Release mode with Eclipse/GCC, and running it on a terminal, this is what we see:

We see here:

Instead of X::Foo(00000000), it prints X::Foo(nil)

Instead of showing a message box, the message Segmentation fault is written to stderr.

Trying to delete a pointer to a variable in the stack caused an immediate crash.

This similarity in behavior is not a surprise: there is only one C++ standard.

Points of Interest

Dangling pointers will not crash a program immediately. If the memory they formerly occupied is still there, they may even work. The crash will be, according to one of Murphy's laws, when and where it will hurt the most.

Avoiding explicit memory allocation whenever possible may save us quite a few headaches.

Standard smart pointers are a useful tool. Smart people have spent time designing, writing and testing them for us. Not using them is just plain rude.

When compelled to use explicit memory allocation (like when calling ShFileOperation or FormatMessage), encapsulating each function in a class and testing this class thouroughly may save some pain.

The ways I've found so far to avoid and fix dangling pointers are: for new code, good design (with clear ownership), and for legacy code, code reviews. And always, everywhere, unit tests.
Code reviews, in particular, shed light on otherwise obscure areas of code, and let programmers who are not the original authors of the code get familiar with it. There are many misconceptions about code reviews, one of them being that the only way to review code is to have a senior programmer go over the code of a junior programmer, but that issue belongs in another article.

Share

About the Author

To make all that work easier, he uses some C++ libraries: STL, ATL & WTL (to write Windows applications), and code generation.

Pablo was born in 1963, got married in 1998, and is the proud father of two wonderful girls.

Favorite quotes:
"Accident: An inevitable occurrence due to the action of immutable natural laws." (Ambrose Bierce, "The Devil's Dictionary", published in several newspapers between 1881 and 1906).
"You are to act in the light of experience as guided by intelligence" (Rex Stout, "In the Best Families", 1950).

But,
When I tried to copy&paste the code snippet with the Producer() and Producer2() and compile it (using VS2005), I got compile errors, because there is "std::auto_ptr<x> px" instead of "std::auto_ptr<X> px". (the upper case "X").

Jozef, thanks for taking the time to read the article!
In order to compile the code samples, try downloading the attached project (it contains them all).
The .sln provided is for VS2008, but you can open the .dsw (VC6 workspace) and Visual Studio 2005 will convert it.

Best wishes,
Pablo.

Pablo.

"Accident: An inevitable occurrence due to the action of immutable natural laws." (Ambrose Bierce, circa 1899).

since you also mention linux; using valgrind will pretty clear tell you what you did wrong.
Using code coverage analysis and unit/blackbox tests and then the same with valgrind as wrapper you can catch most of these situations without much hassle.