An introduction to memory damage problems

Welcome to the world of memory damage. Or, as we experienced programmers
like to call it, A World Of Hurt.

There are no bugs more insidious, more frustrating, and more dangerous that
memory damage problems. They are without a doubt among the worst bugs you
will ever have to track down.

They are implicated in all kinds of nasty behavior. For most apps, they
merely result in the app crashing with some unpleasant access fault, which is
generally an unrecoverable error. But in some cases, such as in network
stacks, IE, SQL Server, IIS, and other famous (or infamous) applications, they
result in the ability of malware to gain control of the application. If
the application is privileged, the malware then runs with the privileges of the
application, and this allows for unlimited damage to be done to the hosting
computer, including the insertion of Trojans, spyware, viruses, and other
seriously evil pieces of code.

But for most of us, it just means the program took a nasty access fault and
died.

There is one absolutely critical, non-negotiable rule about dealing with
memory damage problems. When I was setting development policies in a
software company, this is one that I set and everyone immediately bought into.

WHEN A MEMORY DAMAGE BUG IS OBSERVED, IT TAKES PRIORITY OVER ALL OTHER BUG
FIXES, ENHANCEMENTS, OR ANY OTHER DEVELOPMENT ACTIVITY. ALL DEVELOPMENT
CEASES UNTIL IT IS FOUND.

Why is this? Because memory damage bugs are sensitive to patterns of
memory allocation and deallocation. If you continue to do any sort of
development, it may appear that the bug has gone away, but in fact it
hasn't. It has merely confined itself to damaging memory that is not
critical at the moment. But unless you find and fix it, it
will reappear. If you do not find and fix it right now, in the
development cycle, it will only reappear after the product is delivered.
You do not want to have this happen. Look at what it's done for
Microsoft's reputation.

There is a simple rule for avoiding memory damage bugs: Do not let them
happen. Do not ever code anything that could possibly allow them to happen.

That sounds like good advice, but it goes deeper than that. It involves
disciplines of programming that make it all-but-impossible to have memory damage
bugs.

For example, I've really lost some of my edge on this. This is because
I've been using C++ for so many years now, and using it in highly-structured
ways, so that memory damage bugs are impossible to code. And that's where
you want to be.

Part of this essay is going to be a tutorial for people who don't know what a
heap is, or what we're really talking about here. If you already
understand heaps, you can skip this tutorial introduction.

What's a heap?

The "heap", or "pool" (the actual terminology is somewhat cultural, and depends
on where you went to school and first learned about this) is essentially a block
of memory which can be dynamically subdivided into smaller pieces. For
example, suppose we have a block of a megabyte of memory. We would like to use
parts of it temporarily to hold smaller chunks of information. So we have
a "heap allocator" , a subroutine library package, that handles the details of
this for us. So if I need 1000 bytes of memory, I call the allocator and
say "Please give me 1000 bytes of memory". It looks at the data structure,
and initially sees that it has 1,000,000 bytes of memory available (I'll use
powers of ten just to simplify the discussion), so it identifies a sequence of
1000 bytes which are unused (such as the first 1000 bytes of this million-byte
array), hands me back a pointer to it, and says "Here. Use the bytes at
this address". It then makes sure that it won't use those again for
another purpose; for example, by simply adding 1000 to the starting address of
the block of memory, leaving 999,000 bytes available for future use.

The problem is that this is always a finite amount of memory. We
can't just keep asking for memory indefinitely, because we only have 1,000,000
bytes in this heap. So after 1000 allocations of 1000 bytes, we have no
more memory left. We're out of memory. We're probably in trouble.

So it becomes our responsibility to return to the heap any block of
memory we aren't using. We go to the heap allocator library and call a
function and say "Remember that 1000 bytes of memory I was using? Well,
here it is back again."

The heap is our "bank" and we "borrow" and "repay" memory locations.

But it's worse than that. Money is "fungible", meaning it can be freely
interchanged. If I get 1000 $1 bills (or pick your currency of choice that
has bills), I get 1000 essentially "random" serial numbers. Any bill is worth
$1. But for memory allocation, those serial numbers have to be sequential.
So when I repay the money, I can't repay part of it; I have to repay all of it
in a block, using the same bills I borrowed. And if you want $2000, you
need 2000 sequentially-numbered bills.

So we get into all kinds of problems that the bank-borrow-repay analogy can't
really handle well.

Now, somebody has to keep track of what memory has been allocated and what
memory is available for allocation. When I return a block of memory,
someone has to notice that it is now available again.

This is not going to be a tutorial on first-fit, best-fit, quick-fit, and other
algorithms that tend to enhance allocator performance. We're not going to
really look too deeply inside how an allocator works (like the making of sausage
and laws, you're often better off not knowing). Let me say that I've spent
a nontrivial part of my career writing storage allocators, and I've written some
of the best and fastest around [IDL: The Language and its Implementation,
Nestor, Newcomer, Gianinni, and Stone, Prentice-Hall, 1990], and you can devote
an entire semester to studying this topic. I discuss some of the details
of allocators in an accompanying essay on memory
allocation.

But the way we handle it is that we embed, in the data structures, some
"overhead" bytes which help us keep track of what is going on. So to get
your 1000 bytes, the allocator might actually give up 1008, or 1016, where the
additional bytes are used as part of the bookkeeping. A typical allocator
might have two values for each overhead block, such as a length (the nominal
length of the block) and a pointer to the start of the next block, and a bit
that says whether the current block is allocated or available. In
addition, if you ask for 5 bytes, you might actually get a block whose length is
8, because you get better performance if these blocks are allocated on, say,
DWORD boundaries.

Now, suppose I want to allocate a block of bytes for a string of length 5.
I might get a memory layout which looks like the one below. This assumes
that my block of 1,000,000 bytes starts at memory address 100000 (it doesn't
matter if its in decimal or hex, and I'm going to use decimal here in the
tutorial). Note that a string of length 5 takes 6 bytes to store, because
we always have that 0 byte at the end (no, I'm not going to worry about Unicode
and 16-bit characters; that's just an unnecessary complication here.
Pretend we're limited to 8-bit characters for this discussion)

address

value

state

meaning

100000

100016

A

pointer to next block

100004

6

length

100008

'ABCD'

first 4 characters

100012

'E\0' ? ?

5th character, NUL

100016

10096

F

pointer to next block

100020

76

length

This shows that the block at location 100000 is allocated (A), and the block
starting at 10016 is free (F). Note that because blocks are always
allocated on a power-of-2 basis, we can use the low-order bit(s) to hold these
flags; we just have to remember to mask them off before using the address.
So the value stored in location 100000 might actually be 100017, where the
low-order bit indicates the block is allocated. Before we try to use it as
a pointer, we'll mask off that low-order bit and get 100016 back as a result.
These are clever little implementation tricks. C++ structures have even richer
detail, but you have to read the C runtime sources to see all the details if you
care.

But these overhead bytes are "invisible". That is, if you did

char * p = new char[6];

the value you would see in p, in this example, would be 100008, not
100000. Those first eight bytes are reserved and you can't see them at
all.

Heap Integrity

If we believe we have 1,000,000 bytes, then given the above structure of
information we should be able to start at location 100000 and, by following the
pointers, come to the end of the heap. Each time we pick up a pointer and
follow it, we should get a new, valid pointer at that location.

Note that various heap implementations will handle this differently. In the one
shown above, every block points to the next block which is physically adjacent.
Other heap allocators maintain much more complex lists, but ultimately it should
be true that every block has a valid length and a valid "next" pointer, even if
the block which is logically "next" is not physically next.

Buffer Overruns

Now let's look at how a data overrun can damage the heap. If you allocate
a block of 6 bytes and write 7, what happens?

char * p = malloc(6);
strcpy(p, "ABCDEF");

The answer is, nothing much. You copied 7 bytes (6 8-bit characters and
the terminating NUL) into a 6-byte buffer, and you got

address

value

state

meaning

100000

100016

A

pointer to next block

100004

6

length

100008

'ABCD'

first 4 characters

100012

'EF\0' ?

5th, 6th characters, NUL

100016

10096

F

pointer to next block

100020

76

length

Note that because the allocator actually gave you two DWORD chunks for
your storage, nothing happens. You can see that if you do

strcpy(p, "ABCDEFG")

you're still safe. Note that both of these strcpy call are
actually erroneous, because they are violating the specification of how
many bytes are stored, but you don't notice.

But suppose we do

strcpy(p, "ABCDEFGHIJKLP");

Now we're in seriously deep trouble. That value which was supposed
to be a pointer to the next block has been overwritten:

address

value

state

meaning

100000

100016

A

pointer to next block

100004

6

length

100008

'ABCD'

first 4 characters

100012

'EFGH'

5th-8th characters,

100016

'IJKL'

F

pointer to next block

100020

'P\0' 0 0

length

So the next time the allocator tries to pick up that pointer and do something
with it, instead of a valid pointer, it has the value 0x4B4A4948, which is (a)
the actual bit pattern that represents 'HIJK', except it is actually stored
low-order-byte first (0x48) when seen as an integer, and (b) it is
not the expected value.

This will almost certainly cause your program to take an access fault. The
length, which used to be 76, is now seen as the value 0x00000050 ('P' is 0x50)
or decimal 80. We are well and truly screwed at this point. You
don't even want to think about what is going to happen when the length
value is 'PQRS'!

The difference between the "release" heap and the "debug" heap is that the
"release" libraries assume that you haven't made any such blunder; that your
code is perfect, and there is no reason to waste valuable computing resources in
trying to figure out that you screwed up. It trusts you, with a pure form
of hero-worship unmatched in human beings. You can truly Do No Wrong.
Unfortunately, the hardware is there to detect when you did. Sometimes.

In the debug heap, a lot of extra effort goes into checking those values.
The most common form of failure is that you get some sort of debug assertion
failure in the heap allocator. If you trace back to the source code,
you'll see that it is very unhappy about some value somewhere.

Sometimes, by examining the nearby locations, you can see what an incredibly
stupid mistake you made. Addresses that look like characters (ask the
debugger to show them in character format!) may contain recognizable characters
and you can say "Doh! I did that allocation wrong!" and go on with life.

This is the rare case. The rest of the time, you've overwritten the heap
with data that is very hard to interpret, but the results are equally fatal.

In addition, the debug heap initializes heap storage. It stores
0xCD in every byte that is validly allocated. So for example, when you
initially allocate the string, you will see the memory contains a lot of "Í"
characters (that's an upper-case I with an acute accent on it in the standard
ISO-8859-1 font that is the most common font used for Windows in Western
culture). So the memory would be

address

value

state

meaning

100000

100016

A

pointer to next block

100004

6

length

100008

'ÍÍÍÍ'

allocated

100012

'ÍÍýý'

allocated, no man's land

100016

100096

F

pointer to next block

100020

76

length of this block

The six bytes we allocate are initialized to 0xCD, Í,. But what about
those last two?

They are in what is called "no man's land". They are initialized to 0xFD,
the "no man's land" constant, which appears in the ISO-8859-1 font as "ý".

When you free storage, the debug heap checks to see that you haven't
accidentally overwritten the "no man's land" values with anything else. If
you do, you will get an assert failure at the time you free the storage.
You will know that you have had a "harmless" data overrun. But the next
time your error might not be harmless!

The debug heap also tends to put "trailer" blocks on the storage areas.
For example, a redundant allocator might store values such as

address

value

state

meaning

100000

100024

A

pointer to next block

100004

6

length of this block

100008

'ÍÍÍÍ'

allocated

100012

'ÍÍýý'

allocated, no man's land

100016

100000

pointer to this block

100020

6

length of this block

100024

100096

F

pointer to next block

100030

68

length of this block

When you go to free a block, it checks that the pointer at
100000+ceil4(length)+8 contains a reference to 100000. If it doesn't,
it's been clobbered, and this is reported when the block is freed. (In this
example, ceil4 is a function which takes whatever the value is and, if it
is not an exact multiple of 4 adds 1, 2, or 3 to the value to obtain a value
which is a multiple of 4, that is, ceil4(9) == ceil4(10) == ceil4(11) ==
ceil4(12) == 12).

Similarly, during the allocation process, as it is searching for a useful block,
it checks all the blocks it passes over to make sure they are intact.

In any case, whatever algorithm your allocator uses, it detects that somebody
has seriously damaged the integrity of the heap, and you get an assertion
failure. It doesn't happen when the damage is done; it happens when the
damage is detected. If, in the process of allocation, you don't trip over
a damaged block, you won't notice it. So you could do tens, or thousands,
or heap operations before the nature of your request causes a damaged block to
be examined.

Freeing storage

When you free storage, in the release version, nothing much happens to it.
The overhead values are changed, of course, to indicate that the storage is now
free. But that's pretty much all that happens. But in the debug
heap, the contents of the heap are erased, set to the magical value 0xDD
(character 'Ý'). So if the storage is freed, you would see

address

value

state

meaning

100000

100016

F

pointer to next block

100004

8

length

100008

'ÝÝÝÝ'

freed pattern

100012

'ÝÝÝÝ'

freed pattern

100016

10096

F

pointer to next block

100020

76

length

except you probably wouldn't actually see this. Allocators are free to
practice what is called "adjacent free block coalescing", that is, if there are
two free blocks, they are merged to form a single free block:

address

value

state

meaning

100000

100096

F

pointer to next block

100004

88

length

100008

'ÝÝÝÝ'

freed pattern

100012

'ÝÝÝÝ'

freed pattern

100016

'ÝÝÝÝ'

freed pattern

100020

'ÝÝÝÝ'

freed pattern

This is another place where you will often get assertion failures in the debug
allocator: when it goes to free storage. It not only checks that you
haven't overwritten the "no man's land" area; it also checks that the adjacent
block is also well-formed. If you clobbered it, you'll get an error.

Some allocators do "aggressive coalescing", that is, the always try to combine
blocks on every free operation (which means they have to look at the previous
block as well). Others do what is called "lazy coalescing"; they don't
combine blocks until they feel like it. Each of these has different
performance issues.

I freed my blocks but my program didn't get smaller!

There is a myth that freeing storage makes your program smaller. This is
unlikely in the extreme. It just depends on what you mean by "smaller".

There's total memory, and there's active memory.

Total memory is the amount of address space which is allocated to your process.
It can be huge. See my essay on "how large is my program?" But at
any given instant, practically none of it is used. In fact, for any given
instruction, the average number of pages that might be touched will be some
number like 1.4 or so (some instructions don't touch memory, such as
register-to-register instructions). But we normally care about slightly
coarser granularity.

Over a small block of execution, such as a time slice you get from the
scheduler, typically some tens of pages might be touched. In rarer cases,
hundreds of pages. This is normal. The set of most-recently-used
(MRU) pages forms what is called the "working set". If you don't use
memory, whether you free it or not, it drifts out of the working set and
therefore you aren't touching a lot of pages.

But if you don't free memory, each allocation is going to consume more total
memory. And if you build a linked list, it means that you get more and
more likely to have the next element on the list be in a different page than its
predecessor or successor. This leads to a problem of "unbounded working
set". Each time you walk along the list, you are touching a different
page. This means you could be getting not tens, or even hundreds, of page
faults per timeslice, but thousands.

This means your program requires a lot of physical memory to run. This
means it will potentially take lots of page faults as you demand more and more
physical memory, and other applications also demand that memory. So when
you force their pages out, they take lots of page faults, and when then run,
they force your pages out, and you may not even have as much physical memory on
your machine (512M) as you have data pages (2000M), so you are constantly taking
page faults. This is called "page thrashing".

Just to give you an idea about performance: if you get
your data in the L1 cache, it costs you 0 CPU clock cycles to get it. If
it is in the L2 cache, it costs you 1 or 2 clock cycles to get it (full-speed
[Xeon] or half-speed [Celeron] caches). If it isn't in either cache, and
you have to go to main memory to fetch it, it costs you a delay of 20 CPU clock
cycles. But if you take a page fault, it costs you 20,000,000 clock
cycless! That's six orders of magnitude performance loss.
Sometimes dynamic memory allocation in small chunks is more destructive to your
performance than allocating a single large chunk!

Some allocators try to optimize allocation by keeping the storage compactly
allocated. This can be challenging, but it can be done. The C/C++
allocators do not do this.

But suppose you have memory allocated like shown below (red is allocated, yellow
is free)

Note that this allocator uses lazy coalescing because there are two adjacent
free blocks.

Now you free up a lot of space:

What, exactly, can be :"returned to the operating system"?

Answer: nothing. While there may be "free pages" in the middle, you can't
return them to the operating system because they contain those headers about the
block sizes and pointers to the next block. If they were "returned to the
operating system", any attempt of the allocator to use those blocks would take
access faults! All that really happens is that those blocks, since they
are not used, will fade from the working set and the working set will become
smaller (small working set is a Good Thing). But you won't see anything in
any of the tools that talk about program size (other than working set size) that
would indicate your program got "smaller" because you freed memory!

Stale Pointers

One other major way of getting heap damage is that you keep a pointer to storage
that has been freed, and erroneously use it to store something in that space or
read something from that space. Such a pointer is called a "stale pointer" or a
"dangling pointer". You are writing to space you no longer own.

Because the storage might have been re-allocated for some other purpose, storing
through the pointer can cause some other data structure to be damaged.
That data structure is an "innocent bystander". Whoever allocated it
legitimately believed that they, and they alone, were in charge of the contents
of that structure. Now your erroneous code comes along and just jams some
random value into one of those memory locations, thus completely invalidating
what was going on.

If you are very, very lucky, the field was a pointer and you've clobbered it
with some value which will lead very quickly to an access fault as the value you
stored there could not be interpreted as a valid pointer. The hardware
will catch you.

But given how prevalent pointers are, you may well store a pointer to something
that makes sense to you, but to the actual owner of the structure, they
interpret it as a pointer to something else entirely, and the world
begins to disintegrate as that structure is damaged, which can then propagate
the damage to someplace else, and it may be a very long time before the
manifestations become evident.

Premature Freeing

One cause of stale pointers is when one part of the program frees up a block of
memory while other parts are still assuming their pointers are valid. The
result is that pointers which are supposed to be valid become stale.
Premature freeing is one of the common errors.

Uninitialized pointers

When you execute a constructor, you should initialize every pointer in your
class, preferably to some useless value like NULL. In the release
version, you simply inherit whatever bits were left on the heap. This
means if there had formerly been a pointer there, it is still a pointer, and if
you access what it points to, you can happily write all over something, or read
some value that makes no sense but use it anyway. If that value is a
pointer, then again we have all kinds of possibilities for collateral damage.

The debug heap and debug stack avoid this because all heap pointers are
initialized to the illegal value 0xCDCDCDCD when an allocation is done (or the
no-man's land value), all values in the block (except the invisible overhead
bytes) are set to 0xDDDDDDDD when storage is freed, and all stack variables are
initialized to 0xCCCCCCCC in the debug stack. But you shouldn't rely on
this because, while it helps, it isn't good enough. If you made an error
that is not caught during development because that particular sequence of
code was not encountered, then when it finally is encountered in the
release code, you will get heap damage. Instead, you should always
initialize to NULL which has the advantage that it fails in both the
debug and release versions. Even though it will cause the release version
to crash with an access fault, the cause is immediate and proximate to the
actual error, not the consequence of seventh-level storage damage detected
billions of instructions later.

This is another danger of memory allocators. Let's look at that block of
memory, only this time we'll assume all those rectangles represent 1000 bytes
and we have 20,000 bytes total, and all allocations are in multiples of 1,000
bytes just to simplify the discussion

1

2

3

4

5

6

7

8

The numbered blocks represent blocks that have been allocated. Note that
some of them are 2000 or 3000 bytes in length in this example.

So you add up: 1 + 2 + 2 + 3 + 2 + 1 + 3 + 1 = 15, and realize that you have 5K
bytes available. So you try to allocate 3000 bytes, and you get back a
NULL It should be obvious why: although there are 5000 bytes
available, there are not 5000 contiguous bytes available. Your memory has
fragmented. You can't allocate the space you need.

Doing lots of allocations, especially with an "aggressive splitter" allocator
("first-fit", the classic C/Unix algorithm, is an aggressive splitter) means
that large blocks are preferentially split into small blocks.

So the obvious solution is to "repack" the memory to maximize the hole size:

1

2

3

4

5

6

7

8

which would allow a 3000 byte allocation to run. But it turns out this
isn't possible. In C and C++, you had a pointer to block 2. It now
points to the middle of block 2, not the start. You had a pointer
to block 5. It now points to the contents of block 6. In C and C++,
you can't do memory compaction because to do so you would have to find every
pointer to every block and adjust them to represent the new storage layout.
This isn't going to happen.

Languages like Java, C#, and Managed C++, however, do know where all the
pointers are, and can fix them up.

But that copying can generate a lot of page faults, and it can take a lot of
time even if it doesn't page fault. So compaction doesn't happen
all the time, only when the situation is desperate. So, in a compactiing
allocator, the allocator knows that it has 5K, it can't find a 3K hole, so it
compacts, but only at that point. We can forget storage compaction as a
workaround for C/C++ unless we're willing to go through a lot of work to
do it ourselves!

Freeing memory does not necessarily create "more" space. It does mean that
the space that has been freed is available for reallocation.

Don't be surprised if you "run out of storage" even though you think some should
be there. This also means that massive allocations, such as for a giant
bitmap image, may not work. Asking for 100MB may not work because
the allocator migh not be able to find a 100MB hole in your heap. This is
why large image processing tends to break the picture up into smaller,
manageable chunks. It also adds complexity to some algorithms.

Managed languages also make it impossible to do buffer overruns because all
accesses are checked, all the time. There's no "debug" or
"release" difference in how they work. Because they are garbage-collected,
there's no explicit delete operation, so you can't get stale pointers.
If a pointer exists, the managed system guarantees that it points to
legitimate storage, and that storage will not be freed until all of the
pointers to it have been freed.

Pointers are not permitted in managed code. In managed languages, they are
replaced with a pointer-like concept, references, but a reference is a
pointer on which you can't do pointer arithmetic. This saves you from the
possibility of writing something through a bad pointer, because the concept does
not exist. There is no such concept as an uninitialized reference (they
are all initialized to NULL, or the managed language's equivalent
thereof), so you can't have uninitialized pointer errors. Managed
languages protect you from most of the ugliness of storage allocation.

(In Managed C++, however, you can do pointer arithmetic. This is
considered very dangerous).

But we're talking unmanaged C/C++ here. The storage allocator is In
Your Face, All The Time.

Storage fragmentation is deadly to programs like System Services. If you
are running 24/7/365, and you reboot your server once every two years, even a
tiny amount of fragmentation can add up to a massive loss of usable storage.
This is one reason I don't favor C++/MFC in System Services. because you can't
control the memory allocation closely enough. Managed languages would be
much nicer for System Services.

So what about all those assertions in the allocator?

A common observation I see far too often is "There's something wrong with the
debug heap. My program keeps crashing in the debug version, but not in the
release version. It runs fine in the release version".

This shows a deep misunderstanding of the whole concept of debugging.
You do not even consider creating or running a release version if there is
any failure in the debug version.

This is not a negotiation. This is not even worthy of discussion. Until the debug version works, the release
version is meaningless.

Some people seem to think that those assertions represent bugs in the debug
runtime, because they never get assertions in the release version.
Hello, people! That's what a release version does. The
assertions are not compiled in a release version, so of course they can't
happen. It doesn't mean the problem has gone away; what it means is the
problem is not being reported!

These people fail to realize is that their code does not "run fine"
in the release version. In fact, the only statement they can make
is "in the release version, the fundamental screwups in my code aren't detected,
so I can pretend they aren't really there". This is known as the "ostrich
approach" to programming. "If I don't see the bug, it isn't there".

This is also what is known in the trade as a "delusional dysfunction".

[Ostriches, by the way, don't bury their heads to avoid looking at danger.
This is a myth, but one that has become entrenched in popular folklore].

The problem with those nasty assertions is that people will report "I don't see
anything wrong with my code" because they are looking only at the call site,
where in fact everything may be fine. The damage was done billions and
billions of instructions ago, and you are just now discovering it. The
call you are making is quite possibly an innocent bystander.

Let's look at that storage. Suppose we had our example

1

2

3

4

5

6

7

8

Now, somebody who is writing block 6 writes 1200 bytes, and we get

1

2

3

4

5

6

7

8

Note here that the free block between 6 and 7 has now been partially
overwritten. Your next four allocations are for 1000 bytes each, and we'll
assume "first fit" here for the method of allocation

1

9

2

3

10

4

5

11

12

6

7

8

So blocks 9, 10, 11 and 12 are all successfully allocated. Next, we free
block 8:

1

9

2

3

10

4

5

11

12

6

7

8

Even if we were doing aggressive coalescing, block 8 only has to look back to
block 7 (which is still intact) to see that it is still allocated and a coalesce
can't happen. So the allocator is still happy, thinking everything is
intact.

Now we go to allocate another 1000 bytes. Using "first fit", we go
traipsing across the storage, seeing that blocks 1, 9, 2, 3, 10, 4, 5, 11, 12
are all fine. Even block 6 looks good, because it has a pointer to the
free block that used to follow it. Except when we get there, we find that
it does not look like a valid heap header block. And that's
when you get the assertion failure in the allocator! (The release version,
trusting that everything is good, will simply pick up the pointer and try to use
it. If you are lucky, this will cause an immediate Access Fault)
This detection was a long time after the damage was actually done, and we
were able to do new and delete without any ill effects even
after the damage had been done. You might be tempted to wonder what
had gone wrong in your call to allocate a new block, but that isn't where the
problem is.

A similar error would occur if you tried to free block 6, or if you freed block
7and under aggressive coalescing it tried to see if it could merge with the
block in front of it, only to find there wasn't a valid block header there. So
you see, where the damage is detected is not related to where it is
caused.

[If you are familiar with the roadrunner cartoons, imagine the following scene:
the roadrunner has undercut the cliff. The cliff falls down. The
coyote is left standing in midair. He doesn't start to fall until he looks
down and sees that there is nothing to stand on. That new, malloc,
delete,
or free call is the point where the coyote looks down. The assertion
failure is that little puff of dust as he hits the ground a thousand feet
below...but the real damage had been done long before when the cliff fell away.]

So when you get one of these errors, it is almost, but not quite, a certainty
that the call you just made on the allocator is not the call that has
caused the damage. It is the call that has detected the damage.

Now, for simplicity, let's assume that we allocated block 2 as 2000 bytes, that
is, 500 32-bit pointers, to objects of type A. We allocated block 3 as
2000 bytes, that is, 500 32-bit pointers, to objects of type B.

Then, due to a programming error, we wrote 3000 pointers into block 2, pointing
to objects of type A.

We have just clobbered half of the pointers in block 3, which now, instead of
pointing to objects of type B, point to objects of type A (well, a bit under
half because we also clobbered the control blocks of block 3 as well). But
the code that is working on the array in block 3 assumes that these pointers are
pointing to blocks of type B, and happily uses them, thus corrupting the
information in the blocks of type A. Then when the code erroneously uses
the blocks of type A which have been damaged uses information that is also
erroneous, and clobbers something else. This can continue for many levels
and you don't see the failure until the ultimate damaged location is used in a
way that the hardware refuses to process, for example, you stored a small
integer into what was supposed to be a pointer field and somebody goes off and
uses it. Boom! Access fault!

A*

B*

1

2

3

4

5

6

7

8

becomes

A*

B*

1

2

3

4

5

6

7

8

True story: around 1982, I was working with one group that not only had an
intractable memory damage bug, but one that did not appear in the debug version
of the application (it was on a VAX machine, which did not have the
concepts of separate debug and release heaps, but since I wrote the allocator I
had a lot of these techniques in place. But the bug would just
slither away. Unfortunately, this meant that it hit our customers but not
our developers.

The debugging tools were not terribly good, either; we didn't have source
debuggers for our implementation language. I spent 17 straight hours
tracing assembly code, instruction-by-instruction, tracing the damage back.
It was seventh-level damage. That means that the original damage
caused something else to be damaged, which caused something else to be damaged,
which... until we got back to the original cause of the error. The
original cause was a piece of non-robust code that should never have been
written the way it was.

The error, by the way, was that someone had written

array[index++] = value;
if(index == number_of_items)
... report error

but never bothered to reset the index, so the next time through, the index
was not equal to the count, but greater than, so they went happily off,
clobbering some adjacent structure. And we didn't get a hardware fault (an
access fault) until the damage had propagated seven levels.

Hint: when testing bounds, always write

if(index >= number_of_items)

that way, when you do make a mistake, you get some reasonable response to it,
not just random damage!

I received an official award for this; at some point I have to dig it out and
have it framed. It showed a Flounder spearing a bug.

But it worked in the debug version, but not in the release version!

Yes, there are very interesting cases where your code will work apparently
without error in the debug version and fail in the release version. Nearly
all of these failures are the result of program errors which the debug version
was incapable of detecting, but the release version is sensitive to. They
rarely involve heap damage, and they are addressed in a
separate essay.

So how do I avoid memory damage?

Lots of ways. The most important mechanism is to write your code
correctly. The second-most-important mechanism is to write it in such a
way as so when you do screw up, the error is detected early.

Here are some techniques I've applied over the years.

Do not use malloc in C++ programs.

This is just silly. Unless you are really interfacing at the lowest level,
implemented operator new, for example, you have no excuse for using
malloc. So don't do it.

Avoid allocating arrays with new

Whenever possible, which is most of the time, you can avoid ever replacing
malloc with new of an array. For example, there is no real
advantage to recoding

LPDWORD b = (LPDWORD)malloc(n * sizeof(DWORD));

with

LPDWORD b = new DWORD[n];

if you continue to use non-robust algorithms to manage your data.
Instead, use CArray or std::vector for such purposes.

Avoid fixed char arrays

Generally, you should forget the data type char exists at all!
This is based on the naive myth that you will never require Unicode characters.
Always program "Unicode-aware", even if you don't think you're going to need it.
But code like

char buffer[20];
sprintf(buffer, "%d", value);

belong in the dustbin of history. Just forget they exist. When
you're programming in MFC, use CString::Format. There's no
reason to use these antiquated relics of the PDP-11 implementation ever again.

Use CString for all string operations

At least in MFC, CString is even preferred to std::string, and
all strings should be stored as CString values, not as char *, or
even TCHAR *. You should never be allocating arrays of characters (char
or wchar_t) unless you have compelling reasons. And you usually
don't have compelling reasons. Most people who use malloc or new
of arrays of characters are simply using antiquated techniques they learned in
their first C course. These are not appropriate for modern programming
methodologies.

Use bound-checked arrays

For example, instead of doing

LPDWORD b = new DWORD[n];

I'm much more likely to use

CDWordArray b;
b.SetSize(n);

for two reasons:

It will automatically be deallocated when the variable leaves scope

It will be bounds-checked when I access it

People who are comfortable with std::vector will advocate it for the
same reasons, and they are absolutely right as well. There is no reason
not to use bounds-checked arrays.

Note that if you need to pass these values into some API or library function,
for example, one that requires an array of DWORDs, you can do it by

SomeAPI(b.GetData(), b.GetSize());

because GetData will return the address of the value array.

Always provide bounds for all operations

Neverassume that your buffer is going to be "big enough".
You must always pass in an explicit buffer length, and it must always
be checked. Fortunately, this is now easy. See below.

I once worked on an application that assumed they knew what
MAX_PATH should have been. They were sure it was 256, and throughout
their code, the magic number 256 was hardwired. It wasn't even a
#define, because there was no need to change it. Needless to say, one
client had a path that was longer, and was actually MAX_PATH. Since
nobody bothered to check the array bounds, damage propagated all over the place.

(For those who don't know, MAX_PATH is defined as 260 in the current
Microsoft C header file stdlib.h)

Always avoid strcpy and strcat

And in fact, always program Unicode-aware, and these operations aren't, but
you must also avoid the Unicode-aware operations _tcscpy and _tcscat.

Use the safe-string functions. For example, read about
strsafe.h and look at operations such as
StringCchCopy and StringCchCat. These take explicit buffer
lengths and therefore will not commit a buffer overrun.

Initialize all pointers

In your constructors, always set pointer values to NULL.
For stack variables, always set pointers to NULL.

Sure, the debug version initializes them to bogus values, but if you don't
test the path that would use an uninitialized value at debug time, then you'll
get a random value in the release version which just might not access fault, but
point to something that shouldn't be damaged.

Declare pointers at the point of initialization

Far too many C programmers think C++ is C, and they will jam all their
declarations up into the top of the block, just like we had to do in all those
obsolete languages like Algol and Pascal. This is just plain silly.
In C++, you can declare a variable anywhere. So don't declare it until
it is needed. Instead of writing

Whatever * p = NULL;
... lots of lines here
p = new Whatever;

instead just do

...lots of lines here
Whatever * p = new Whatever;

that way, you don't have it sitting around as a candidate for error.
You don't declare it until it is needed! So the declaration and the
initialization are essentially concurrent, not separate by tens or even hundreds
of lines, and there is no danger you will accidentally use the variable before
it has been properly initialized, because syntactically this would not be
possible!

When possible, NULL pointers out after a delete operation

If you have a pointer to a value in a class, and you free the object pointed
to, set the pointer to NULL. That way, if you accidentally
use the value you'll take an access fault, rather than read from or write to the
recently-freed or possibly recently-reallocated memory.

Use smart pointers

"Smart pointers" are essentially reference-counted pointers. When used
properly, they reduce (or eliminate) the possibility of stale pointers. If
you have complex lifetime issues and distributed lifetime issues, these are a
Really Good Idea.

I still have memory damage problems!

The Good News is that if you follow the above guidelines, you probably won't
have memory damage from buffer overruns. But you can still get it from any
of the other causes. So you followed all the rules, and you're still not
seeing it. You've got a 100,000 lines of code, and now it's crashing.
Where do you start looking?

The simplest mechanism I've used is the OnIdle test. In your
CWinApp-derived class, add an OnIdle handler.

Because this is an ASSERT it only gets compiled into the debug
version. You might feel free to add an additional surrounding #ifdef
MEMORY_CHECKING or some similar preprocessor symbol of your choice.
You would define this to turn on the memory checking.

What does this do? Well the _heapchk function runs through the
heap checking the heap integrity. If you made some error, the next time
you go into message wait you will verify the heap. The ASSERT will
fail. This is really nice
because it catches the damage relatively early. So, if I select the "Do
this" menu item, or click the "Do that" toolbar button, or whatever, and the
next thing that comes out is this assertion failure, I pretty much know that

It hadn't happened before I selected that operation or the ASSERT
would have failed earlier

It happened as a consequence of that selection

This narrows down where the damage can be.

If you still can't stop it, start throwing those ASSERT statements
liberally around the execution path that is followed by the path starting at the
OnDoThis handler for the "Do this" menu item. As long as you don't
have multiple threads out there bashing your memory, this will really narrow it
down.

And what about those threads?

Put the ASSERT in your thread loop. Screw the performance.
You're looking for errors! That's why we have conditional compilation!

I once turned on all the tracing, checking, and logging features of my
allocator. The program was running about 100 times slower than normal.
So I went down to the company cafeteria and had a nice pizza. When I got
back three hours later (I had to talk to people, didn't I?) it was still
running. Twenty minutes later, the allocator detected the error, and we
had it nailed. It took time, but no effort. Never confuse the two.
Search for zero-effort solutions.

What about HeapValidate?

The HeapValidate call provides for a system heap what _heapchk
does for the C/C++ runtime heap. The problem is that HeapValidate
isn't all that useful. For example, each major block of C/C++ runtime heap
is a single "Heap" in the HeapValidate sense, and while the innards of
the heap may be completely trashed, the HeapValidate API tells you
everything is just fine. So largely, this API doesn't do you much good.

Stack clobbers

These are actually fairly easy to find. You die with an access fault on
a return statement. You find yourself stating at a piece of
assembly code that makes no sense at all (often just repeats of some single
instruction, such as

int 3

which is the interpretation of the byte sequence 0xCC, or

add byte ptr [eax],al

which is the interpretation of 0x0000.

Sometimes you get an instruction fault, and frequently you get an access
fault. Often the stack backtrace is completely trashed. If you see
this, you can have a high confidence that you overwrote an array on the stack.
Go read the rules about not putting fixed-size arrays on the stack, or anywhere
else for that matter.

Remember, stacks grow downward in memory. Suppose I had a
function

void SomeFunction(first, middle, last)
{
DWORD data[5];
...

Then our stack would look, after the call, like

stack bottom á
this way

last parameter

middle parameter

first parameter

return address

old frame pointer (EBP)

data[4]

data[3]

data[2]

data[1]

data[0]

Now suppose I had a loop

for(int i = 0; i < sizeof(data); i++)
data[i] = rand();

Note the error here. I used sizeof(data) which would be 20, but
then I used that as an index! I should have used sizeof(data)/sizeof(DWORD)
or sizeof(data)/sizeof(data[0]). So what does it do? When it goes
to write data[5], it clobbers the old EBP with random data (thus
destroying my ability to see a stack backtrace), data[6] overwrites the
return address with random data; data[7] clobbers the value of the first
parameter, data[8] clobbers the value of the second parameter, and
data[9] clobbers the value of the third parameter. After that, the
values of the local variables, and perhaps the EBP, return address, and
parameters of the calling function will all be destroyed, if the caller's stack
frame was small.

This Is Not A Pretty Sight.

Now this damage will cause the old EBP, now a random value, to be placed in
the EBP register, and then it will attempt to return to the location specified
by the damaged return address. Who knows? There might even be
instructions there! (And if there are, you've just seen the buffer-overrun
exploit take control of your machine...)

Fortunately, once you've seen a couple of these, they become easier to find.
Unfortunately (there's always the Good News/Bad News problem) there's no simple
trick you can do to avoid them.

BUT--and this is one of the new features of VS.NET (which in spite of the
fact the IDE sucks, has a really great compiler)--in debug mode you can enable
stack-frame checking. Each stack frame gets a gratuitous padding of bytes
which are set to known values. Whenever a return occurs, a function is
called that checks this hidden space on the stack and verifies that it is still
the stack's equivalent of the No Man's Land values used in the heap. If it
has changed, you are likely to get a much more useful error notification than
"Instruction fault at..." with some unintelligible hex address.

Desperation Mode

The problem with some of the heap damage is that while your objects are
getting clobbered, the actual heap control blocks are actually quite safe.
Your data is being trashed, but all the _heapchk operations give you a
clean bill of health.

These are the times that try the souls of programmers.

At one point, I built up the following logic: each structure contained a
checksum field. The checksum had a value which was computed by (literally)
summing up all the DWORD values found in the structure using a suitable
checksum algorithm. The actual checksum you use
doesn't matter, as long as it is reasonably robust. Ideally it isn't
to hard to compute.

I placed the checksum value as the first DWORD in the structure.

I removed all public variables from the classes (not a bad idea, anyway) and
replaced them with accessor function.

When I called a set method, I set the value then recomputed the
checksum.

so if I were checking, I'd actually compute the checksums, and if I weren't,
no code was generated.

The idea is that there is a checksum which is only valid if there has been a
deliberate access that sets a field of the class. If someone else reaches
in an clobbers something, the checksum will no longer be valid. Since all
legitimate accesses are controlled, all damage has to be from illegal accesses.

This found the problem. It was a dangling pointer. All of my
classes were large enough, and the offset of the damaged field small enough, so
that it never damaged any heap control block information. But it trashed
the actual contents far too badly to manage.

Desperate times require desperate measures.

In one case in an old C program, I had replaced all calls on malloc
with calls on my GetSpace function. The GetSpace function
actually linked all my allocated blocks together. Periodically I would
just race down the list of allocated blocks and check all the checksums!
This was a very aggressive approach, and not easily accomplished in C++ because
of the difficulty in overriding new. The error in that case (this
was about 1986 or so) was both remarkably stupid (in terms of the coding error)
and remarkably subtle (in terms of the damage). When I had replaced
malloc/free with my GetSpace/FreeSpace, and run the checksums, the
bug popped out in under five minutes of runtime! On an 8088.

The App Verifier

Microsoft has a nice little tool, the "App Verifier", which you can download
from their site.

This has a number of heap tests. One of the powerful ones is the
PageHeap test. In this test, blocks are not allocated out of the generic
heap as I've shown above, but in fact every block is allocated at the end
of a page! Furthermore, the page before and the page after the page in
question are "guard pages" and do not exist. Therefore, it you write off
the end of a block, you typically write into a nonexistent page and get caught
right away. When a block is freed, its page is marked as inaccessible, and
any attempt to use a stale pointer takes an access fault. This is,
needless to say, a voracious use of memory; you are urged to have at least 256MB
of physical memory and a 1GB paging file to run anything using the PageHeap
test.

Summary

This essay attempts to explain, particularly to beginners, the nature of
memory damage, its causes, and possible cures. It also shows two of the
more powerful techniques I have used to detect memory damage as early as
possible, thus making it easier to identify.

The views expressed in these essays are those of the
author, and in no way represent, nor are they endorsed by, Microsoft.