Stacking up the heap

In the last post, I promised to see how far we can go using only ‘the stack’ instead of ‘the heap’. I lied, slightly, as instead of using ‘the stack’, we’ll be creating our own stack allocator, separate from the automatic call-stack.

Custom stack allocators may be familiar to people who’ve written embedded software for constrained systems. This kind of allocator works by pre-acquiring a large block of memory (which you typically acquire from ‘the heap’!), and then treating it as a simple stack-of-bytes data structure. If the user requires some bytes, you increment the top of your stack by the requested amount — this is about as simple and as fast as it gets!

I’ve also seen these referred to as ‘mark and release‘ allocators, because at arbitrary points the user can record the value of the ‘top’ of the stack (i.e. make a mark of how big the stack is), allocate some more data from it (which increases the ‘top’ value), and then release just this data by resetting the stack’s ‘top’ value back to the marked value. Alternatively, to release all data in the stack, you can set the ‘top’ value back to zero.

The obvious downsides to this allocator are:

Coarse lifetime management: only whole, contiguous blocks of data can be released, and only from the top of the stack.

It’s great for plain-old-data, but not great for true C++ objects, which require their destructors to be called when released.

If the stack fills up, you’re in trouble!

Dangling pointer bugs are hard to catch (if you ‘release’ an object while some other part of the program still has a pointer to that object, you’re gonna have a bad time).

The beginning

In basic terms, C’s memory management gives us ‘the stack‘ and ‘the heap‘; the former used automatically by variables with function-scope, and the latter provided by C’s malloc/free and C++’s new/delete.

Managing memory allocated from the heap turns out to be error prone, with the possibility of memory leaks, double deletions, dangling pointers, and so on. All of these problems are caused by the fact that it’s up to the programmer to correctly match every new with one delete, with no restriction on said programmer from using unstructured/spaghetti-code in their buggy attempts to achieve this goal.

On the other hand, memory allocated from the stack is extremely predictable — the point of construction and destruction of every object follows a strict structure that makes leaks/double-deletions impossible, and also lets us easily reason about our code to avoid dangling pointers. N.B. ‘avoid’ dangling pointers, not ‘eliminate’ them. Bad programming can still break the stack’s structured rules, such as when returning a reference local variable, which can be erroneously accessed after it’s been popped.

In C++, the hero to this problem is RAII, and much of its magic relies on making heap allocations follow the nice, predictable lifetime rules of stack allocations, which we can easily reason about. In essence, RAII takes unstructured heap lifetimes and associates them with particular scopes(which often reside in the stack).

The pitfalls of RAII

The uses of RAII range from the simple auto_ptr, through to smart pointers such as shared_ptr. In the OOP paradigm, it’s productive to encapsulate/hide details such as reference counting behind the scenes. The problem arises that when trying to write in a more functional style for the sake of parallelism, these encapsulated, hidden mutable states (such as reference counting) are a huge side effect. Unfortunately this makes reference counters like shared_ptr a bit of a shaky abstraction within some kinds of multi-threaded environments.

This wouldn’t be a big deal, but in a typical “thread-safe” implementation, this side effect manifests itself as an atomic counter shared between N threads. i.e. The internal reference counter is a synchronisation point between threads, and may become a bottleneck. In a worst-case scenario, you’ve got 100 cores running your app, but they’re all waiting in line to bump the same reference counter… and perhaps a reference counter wasn’t even a necessary detail in solving the original problem!