Mastering C# and Unity3D

Faster Memory Allocation with Memory Pools

One of the advantages we get when we use unmanaged memory is a huge increase in flexibility. For example, we can easily allocate a whole array of objects at once instead of one-at-a-time when we new a class instance. We can also create a memory pool with one allocation then divide it up ourselves. It turns out that can really speed up memory allocation and, at the same time, actually reduce memory fragmentation on top of the fragmentation we’re avoiding by not creating any garbage for the GC. Today’s article shows you how memory pools work and provides an implementation of one you can use in your own projects!

A memory pool is a block of memory that we, not the OS or VM, decide to sub-divide into a fixed number of constant-size blocks. For example, we may want a memory pool for small allocations so we’d create a memory pool of 10,000 256-byte blocks. To do so, we just call Marshal.AllocHGlobal(256*10000) and all of a sudden we have enough memory for all the objects. For example, we could treat it like an array and make a simple allocator:

That’s a really simple allocator, but it illustrates the flexibility we have with unmanaged memory. We can decide to treat the memory we get back from AllocHGlobal however we want. We can treat it as though it was an array of Vector3 objects just by casting it to Vector3**. We don’t have to call AllocHGlobal every time we need room for another Vector3. We can write our own Allocate function!

There is, of course, a huge problem with this kind of pool: there’s no way to release the vectors we allocated when we’re done with them. We need a Free, but that’s impractical with this simple design. So instead we’ll switch to a new design.

In the new design, we’ll keep a linked list of free blocks in the pool. This linked list will be intrusive because we’re going to store it in the blocks themselves. This means we won’t have to allocate any more space to hold the linked list and we certainly won’t be allocating one node of the list at a time.

The next step of the design is that we want some safety for the memory that we allocate, at least in debug mode. So we’ll add a little extra data at the end of each block in the pool to store a sentinel value. It’s just a constant value, but we can check it to make sure that we didn’t accidentally write beyond the start or end of a block.

With those two pieces in place, here’s how the memory pool’s memory will look:

We’re keeping the “next” pointer in the list of free blocks at the start of each block. Initially, each block points to the next block and the last block points to null. We keep a Free pointer to the “head” of the list.

So how do we allocate from the pool? Easy! All we have to do is chop the head off of the list. It’s just three lines of code and it’s super fast:

void* Allocate(){void* ptr = Free;
Free =*Free;return ptr;}

void* Allocate()
{
void* ptr = Free;
Free = *Free;
return ptr;
}

Here’s how it looks after we allocate our first block:

And here’s how it looks after we allocate another block:

OK, so how do we free a block? Also easy! All we have to do is add the block to the head of the array with three more lines of code:

The “free list” now takes a different path through the blocks in the pool, but that’s OK as long as it gets to all of them.

The real code is a little more complex than the Allocate and Free functions above because it needs to check the sentinel values, do other error checking, and add some useful features. That’s the core of the design though. Feel free to peruse the code below for the full details if you’re curious.

Now let’s put this design to the test. To do so, we’ll use a little test script that allocates five different ways:

Create a class instance with new

Allocate unmanaged memory big enough for a struct with Marshal.AllocHGlobal

If you want to try out the test yourself, simply paste the above code into a TestScript.cs file in your Unity project’s Assets directory and attach it to the main camera game object in a new, empty project. Then build in non-development mode, ideally with IL2CPP. I ran it that way on this machine:

LG Nexus 5X

Android 7.1.1

Unity 5.5.1f1, IL2CPP

And here are the results I got:

Allocation Type

Time

New Class

1152

Alloc

1144

Calloc

1403

Pool Alloc

60

Pool Calloc

211

Note: “calloc” is shorthand for “allocate and clear with zeroes”

Using new to create a class instance is just about as fast as calling Marshal.AllocHGlobal. At 10 million iterations, the difference is negligible. There’s no CallocHGlobal, so my C# code has to clear the allocated memory with zeroes and that adds to the time for the “Calloc” category.

Pools are in a completely different league. Those three simple lines of code absolutely crush the performance of non-pool allocations. It’s 5x faster to “calloc” from a pool and a 20x difference difference if you don’t need to clear with zeroes. That’s often the case since you’re just going to write all the fields of a structure anyways.

So the performance advantages are clear when using a memory pool, but how do they fare in other categories? Well, they create no garbage whatsoever, so that’s an obvious win over using new to get class instances. The GC will simply never track, let alone run to collect any of the memory allocated for or by a pool.

They use fixed-size blocks, so it’s impossible to fragment memory by splitting it into tinier and tinier chunks until reasonably-sized objects won’t fit anymore. That can be a big win compared to Unity’s GC which is notorious for causing fragmentation.

They have the safety of sentinels, so we are alerted when we accidentally underflow or overflow a block. That’s not as safe as individual allocations where the OS will crash the app if we write outside the memory allocated for us, so this is definitely a riskier approach than individual AllocHGlobal calls or using new with classes in that respect.

Finally, here’s the code that implements the memory pool. It’s built on the UnmanagedMemory static class from the previous article. Here’s an example of how to use it:

try{// 10k max allocations at a time
UnmanagedMemory.SetUp(10000);// Allocate a pool of 1000 blocks, each 256 bytes long
UnmanagedMemoryPool pool = UnmanagedMemory.AllocPool(256, 1000);// Allocate a block from the poolvoid* ptr = UnmanagedMemory.Alloc(&pool);// or Calloc// Free a block back into the pool
UnmanagedMemory.Free(&pool, ptr);// Free all the blocks in the pool
UnmanagedMemory.FreeAll(&pool);// Free the pool itself
UnmanagedMemory.FreePool(&pool);}// In debug mode there are many exceptions to indicate problems: sentinel overwrites, out of memory, etc.catch(UnmanagedMemoryException ex){
Debug.LogErrorFormat("Something went wrong with unmanaged memory: {0}", ex);}finally{// Done with unmanaged memory. OnApplicationQuit() is a good place to put this.
UnmanagedMemory.TearDown();}

try
{
// 10k max allocations at a time
UnmanagedMemory.SetUp(10000);
// Allocate a pool of 1000 blocks, each 256 bytes long
UnmanagedMemoryPool pool = UnmanagedMemory.AllocPool(256, 1000);
// Allocate a block from the pool
void* ptr = UnmanagedMemory.Alloc(&pool); // or Calloc
// Free a block back into the pool
UnmanagedMemory.Free(&pool, ptr);
// Free all the blocks in the pool
UnmanagedMemory.FreeAll(&pool);
// Free the pool itself
UnmanagedMemory.FreePool(&pool);
}
// In debug mode there are many exceptions to indicate problems: sentinel overwrites, out of memory, etc.
catch (UnmanagedMemoryException ex)
{
Debug.LogErrorFormat("Something went wrong with unmanaged memory: {0}", ex);
}
finally
{
// Done with unmanaged memory. OnApplicationQuit() is a good place to put this.
UnmanagedMemory.TearDown();
}

And here’s the code itself. It’s about 500 lines, including all the comments and error handling: