Introduction

You may have met programmers who believe that a memory corruption will always, immediately, generate some kind of visible result (most likely a program crash): I have.
I wish they were right.

Background

Memory corruption changes the contents of memory at unwanted locations, thus changing the values of the variables stored at those locations. In real life, variables hold meaningful data,
and a change to those data will have some bad results. Among others: calculations returning the wrong results, programs crashing, programmers losing jobs, and hackers getting access
to sensitive information. The sample project shows a case of memory corruption that actually does nothing but change the value
of a few variables. No Hollywood-style explosions, no loud screeching noises.

Wikipedia says: "Memory corruption happens when the contents of a memory location are unintentionally modified due to programming errors"

Again from Wikipedia: "In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer,
overruns the buffer's boundary and overwrites adjacent memory." http://en.wikipedia.org/wiki/Buffer_overflow

So far I've proved to you that I can copy and paste. But what's the point? Well, the point of this article is that memory corruption, like other kinds of corruption,
can and should be prevented. We'll get to that eventually. Just bear with me for a while. I hope you'll have some fun along the way.

Let's corrupt some memory!

Let's have a look at some parts of the sample program (.sln and .dsw provided):

void OverflowMyBuffer(char *szTest)
{
// Copy a string of size 13 into a buffer
// of unknown size. Its size may be just 3 bytes...
strcpy(szTest, "Hello, world!");
}

This is the easiest way to overflow a buffer: strcpy (or memcpy) into its address (the address of its first byte) something bigger than its allocated size.
For instance, the string "Hello, world!" is 14 bytes long (counting the ending zero); if the buffer szTest is shorter than that, you'll get a buffer overflow.

What if the contents of arr[] were not just a few idle integers, but important data? For instance: a corrupt pointer may make a program crash,
a corrupt variable holding the distance between your ship and the iceberg may send you swimming in very cold waters.

Beginner's aside

- Now, wait a minute! - someone may say: - If the function OverflowMyBuffer() has no access to the variable arr[], how can it mess with its contents?

Well, that's exactly the problem with memory corruption. Using pointers, you can mess with memory everywhere. Consider this: *((int *) rand()) = 0xDeadBeef;
Pretty, right? This code tries to plant an arbitrary value in a random position in the memory of your process: if there is a variable there (for all I know, there might be)
its value will be modified; if the random pointer points to unallocated memory, or memory occupied by code, strange things will happen.

The point at last

Considering that there are Wikipedia articles on buffer overflow and memory corruption, what's the point of this article?

There are two:

A memory corruption will not necessarily generate an immediate, repeatable crash

Some time ago, a colleague, looking for a bug, searched a few thousand lines of code for all appearances of an integer variable whose value was mysteriously changing,
set breakpoints everywhere the variable appeared, and saw the value of the variable change without any apparent reason. He asked me to take a look: how could that happen?
Since you're reading this article, you may have deduced that it was memory corruption due to a buffer overflow. At the time, it wasn't obvious: the code was something like this:

We set breakpoints wherever str was touched, added a watch for x, and the bug was solved.

If you see production code randomly failing at customer sites, to succeed after a couple of minutes, with identical input; if you see simple calculations
sometimes returning weird results, and a minute later the right result; if you cannot reproduce the unwanted behavior under your debugger; if your teammates
are blessed with healthy programmers' ego ("if there's a bug, it's not in my code"); those are usual symptoms of memory issues: either corruption,
or failed, unchecked memory allocation (which is a subject for a totally different article), or, in multithreaded code, a race condition (again, out of the scope of this article).

Buffer overflows can be avoided

With a few simple precautions, you can make buffer overflows a thing of the past:

Don't use C-style strings: use std::string, or WTL/ATL/MFC CString, or CComBSTR/_bstr_t. All of them manage their own memory.

Don't use C-style arrays: use STL containers. Again, they protect you against buffer overflows.

If you absolutely must write a function that receives a C-style array as parameter and modifies its contents, take also a second parameter
of type size_t, with the count of elements, and use it to avoid buffer overflow. Just like strncpy().

Another frequent cause of memory corruption is dangling pointers; I didn't bump into any on the last few weeks. If you're interested in their prevention,
and a Google or Bing search for "how to prevent dangling pointers" didn't help, leave me a comment below.

I hope you enjoyed this article: in any case, thank you for taking the time to read it. Happy programming!

Share

About the Author

To make all that work easier, he uses some C++ libraries: STL, ATL & WTL (to write Windows applications), and code generation.

Pablo was born in 1963, got married in 1998, and is the proud father of two wonderful girls.

Favorite quotes: "Accident: An inevitable occurrence due to the action of immutable natural laws." (Ambrose Bierce, "The Devil's Dictionary", published in several newspapers between 1881 and 1906)."You are to act in the light of experience as guided by intelligence" (Rex Stout, "In the Best Families", 1950).

Pablo. Excellent examples of when corrupt memory occurs. A long article does not make it better, quite the contrary. This short article is easy to read and should increase awareness for those that lack it.

Stefan, as you and I know, native C/C++ programmers always have to think about memory management, even if they are only using stack variables. This is basically different within managed programming. In this way the article is a good trigger to learn more about C variables, parameters, pointers and arrays - but it stops at that point and gives no deeper explanation.

If you say that this article isn't going deep enough, then that is a valid reason and I won't dispute it.

That is not what you gave as a reason though: On the one hand you say people programming C/C++ should know this, on the other hand you tell the same people to go switch to another language. How, then, are they supposed to learn? Are you saying that the only way to learn is to learn by your own mistakes?

My point is, even if the article could be more detailed, it is a good source to learn about falacities of memory handling in C/C++ that programmers can learn from without the pain of having to experience the associated problems themselves. It is sound advive by those who have seen the consequences to those they wish to spare the same.

If a car manufacturer tells his customers there's a problem with the brakes with the newest model and they should consult the closest garage, do you then tell people to get another car? Or take public traffic? There's considerable effort involved with following that advice, and in many cases it may not be an option at all. You are effectively saying the warning is irrelevant. But the alternative you give is not feasible.

I had to track down a random application crash. The only clue I had was that if you opened / closed a document a random number of times, the application would eventually crash in a random location.

I eventually tracked it down to a piece of obsolete code which was given a pointer to a variable which it would always hold onto and at some event it would write to that pointer and leave a sleeping crash ready and waiting in the code to get us some time in the future. It just depended on who / what got that same location allocated to it at a later time.