Debugging Memory Corruption in Game Development

Stack overflow

The stack overflowing can cause memory corruption in a number of ways. A stack is of a fixed size, and immediately it overflows that size it has begun to corrupt memory. What happens next depends on the size of the stack frame, the position of the stack in memory and what lays beneath it in memory.

Not all platforms are equally vulnerable to corruption for stack overflow. Win32 will simply raise a stack overflow exception if the stack pointer writes beyond the bounds of the stack. On platforms that do this, debugging is a relatively simply matter of looking at the call stack, which should have one or two functions repeated over and over, pointing you directly at the culprit.

Other platforms are less fortunate. The PS2 has not special protection for the stack pointer. The stack is frequently placed at the top of the 32MB of memory, which means that if it grows downward past the area reserved for it, it can corrupt data and possibly even code.

Code Corruption

As already mentioned, if the stack is allowed to proceed apace through memory, it will eventually overwrite the code that is currently being executed, causing a crash.

Code corruption due to stack overflow can often be seen in the disassembly view of the debugger. If the code before the crash location looks reasonable, and the code after looks repetitive or contains illegal instructions, then overflowing stack corruption is the most likely cause.

Sparse Corruption

The common cause of stack overflow is runaway recursion. A less common cause is moderate recursion combined with a very large stack frame. This occurs when the programmer has a local variable in the recursive routine that takes up a large amount of space. Example: See listing 2

Here DigTree is a recursive function that has a local variable local_buffer. A new instance of local_buffer must be created on the stack. Since the CBuffer class takes 8K of memory, it takes relatively few recursions to overflow the stack, especially on consoles such as the Gamecube where the stack size is kept as low as possible, often down to 64K or less.

If the huge CBuffer object is not cleared every time the function is entered, then this can have the effect of corruption being evenly spaced every 8K through memory. This can be quite a red herring in a number of ways. Firstly, the first time you see the corruption it might seem to be just a single instance of corruption, making you not think of a stack overflow.

Secondly, it can overstep any tests you do to check for stack overflow. Often you would place some magic numbers in the bottom of the stack, and check to see if they are still there, as a way of detecting the stack has overflowed. If the game does not immediately crash during the recursion, the stack pointer will return to a normal address, and your code will run along merrily until the corruption causes some later problem.

Thirdly, if it is runaway recursion, then the large stack frame might overstep the code that is being executed, causing widespread code corruption, yet not actually crashing in the code that caused the corruption. While this should still point you to stack overflow, the culprit will be less obvious. A stack analysis on the corrupting stack frame will indicate the location of the corrupting code - providing you can detect the write that causes the corruption.