Memory Corruption

Today I had a memory overwrite in my code. It can happen to the best of us. I’ve known memory corruptions to take weeks to track down, they are the worst sort of bugs, which can go undetected for a long time and cause all sorts of strange knock on problems.

This one took me 5 minutes to find and fix. It’s all about having the right tools.

The corruption was caused by some code writing to a deleted pointer, something was holding a pointer into an array which was resized. Yeah, I know.

The first part of combatting memory corruptions is early detection. Memory corruptions can often go unnoticed for a long time, only occasionally causing catastrophic failures (usually hours before you are due to ship). I’ve known games have memory corruptions happening for their entire development cycle, they show up every few months but no one can track it down and then it mysteriously disappears. It sits there dormant, corrupting something that no-one cares about, until something happens to shift memory just right and you’re now overwriting the animation controller of your main player character. That’s usually when I get called in.

One of the huge benefits of using the VMem allocator is its comprehensive memory integrity checking. VMem is constantly checking that the contents of memory are what it expects. If you write to deleted memory the chances are good that VMem will detect it.

In this case VMem detected the problem a few milliseconds after it happened. However, the reason memory corruptions are so difficult to track down is that you usually only know about it after the fact. Even a few milliseconds is too long to help track down who stomped the memory. It’s usually a detective process, looking at the nature of the overwrite, the size of the alloc, the written bytes (do they look like floats… etc), the offset from the start of the alloc. In this case it was much simpler.

First I found the size of the allocation that was corrupted. This is as simple as looking at the FSA in the callstack to see what size range it caters for, in this case 80 – 96 bytes. Then, enable debug level 4 in VMem which turns on the protected heap. The protected heap has a function to decide if it should protect an allocation. Simply tell it to protect all allocations in that size range.

The protected heap allocates each allocation in its own system page. When an allocation is freed, instead of being re-used, the entire page is de-committed. This means that if anything tries to write to that deleted pointer the OS will throw an access violation immediately, bam!

Running my app again, it caught the corruption as it happened, and I could immediately see what the problem was.

These techniques are not new or revolutionary, but what does make the difference is that this isn’t some heavyweight integrity checker that you have to roll out and then wait for hours while your game boots up before crashing because the integrity checker ran out of memory. VMem is something that is always enabled, always there, and totally reliable.