Introduction

I started to write an article regarding Garbage Collection in C++ and one of my comparisons was that real garbage collected languages may be faster than C++ in some situations because they allocate memory in blocks, which makes the allocation of many small objects become extremely fast, and this doesn't happen in C++.

Well, actually we can do the same in C++ but it is not automatic, so it is up to us to use it. To do that we should use some kind of memory or object pooling.

I looked for some existing implementations and I actually didn't like them. Boost, for example, has a pool that's fast to allocate memory but that's slow to free memory if we want to control when we destroy each object. So, I decided to create my own implementation, which I am going to present here.

The Solution

This solution is made of a single template class, named ObjectPool. I can say that choosing between naming the class "object pool" and "memory pool" is a little problematic. As the solution doesn't keep a certain number of objects already initialized, we can say that it is only a memory pool, so the cost to invoke the constructor and the destructor of individual objects will continue to happen. Yet I didn't want to say it is a "memory pool", as users will request objects, not raw memory, from the pool. Maybe I should find another name that doesn't cause confusion but, for now, it is called ObjectPool.

The implementation is like this:

An initial amount of memory is allocated (by default, it is a block capable of holding 32 objects, aligned as a pointer [4-bytes for 32-bit computers and 8-bytes for 64-bit computers]) and there's a reference to a "first deleted" object set to NULL;

Each time a new object is requested the code first checks if there's a pointer for a first deleted object. If there is, then that's the address that will be used and the first pointer will point to the "next" free object (which is the content at that pointer location). If there's no deleted object that we can reuse, then we check if we still have place in our memory block. If not, a new block of memory is requested (which doubles in size, with an specific limit). In any case, we will call the constructor of the object at the address that we chose and then we will return it;

To delete an object, it is pretty simple. We invoke the destructor and then we will consider the object as a pointer. Such pointer will be set to point to the actual "first free" object then its address will be put as the first deleted item;

When we delete the pool itself, it will free the first block and, as each block frees the next one, it will end-up freeing all blocks.

So, we can summarize things like this:

All object allocations are O(1). Surely in some cases memory needs to be allocated, which can take some time, but the performance of such allocation is not directly affected by the number of already allocated items as we put a limit on how big the blocks can become;

All object destructions are O(1), as we call the destructor and simply "swap pointers";

It is not important how many objects we have deleted. The pool will keep all memory blocks allocated until the pool itself is deleted;

When we delete the pool, the memory blocks are free but the destructor of the inner items are not called, so it is up to us to delete each item before destroying the pool (if we need to call the object destructor at all).

And there are the alternative methods that actually don't initialize or destroy the objects. Those methods do all the work for "allocating" or "deallocating" an object from the pool but don't call the constructor or the destructor. This may be useful if we need to call a specific, non-default constructor or if we know that the object doesn't have a destructor (or for some reason we don't want to call it).

When is this pool useful?

When we have a "work" to do that allocates/deallocates many objects and we know that we can keep the memory allocated until the end of such work (most loops enter in this category);

When we know that we have a limited number of objects that will be in memory at the same time, yet we keep "allocating" and "deallocating" them.

Using the code

To use the code we must initialize the pool giving the initial capacity and the maximum size for the other blocks. If we don't give any parameter, the defaults of 32 (for the initial capacity) and 1 million (for the maximum block size) are used.

So, a line like the following will initialize our pool in the stack (or statically) using those defaults:

Thread-safety

The presented code is not thread-safe but this actually makes it faster. So, if we use it in a static variable or to pass it to other threads, it is up to us to use some kind of locking.

Personally I think that we should not have a real static pool and, if needed, we should have a pool per thread. This will avoid the performance degradations caused by locking.

Sample

The sample is an application that keeps creating and deleting 100 objects 1 million times, then it shows how much time it takes to finish the job, using the pool and using normal new and delete calls.

Of course this is not a real situation, but it shows how fast the pool can be compared to normal new/delete calls. It is up to you to use it in better situations.

Version History

12, April, 2014: Added a memory allocator parameter to the template, the default one uses the new/delete operators instead of malloc/free, declared the copy constructor/copy operator as private to avoid the default implementation, made the constructor explicit and changed the delete of the memory blocks to be in a while instead of being recursive to avoid excessive use of the stack;

Share

About the Author

I started to program computers when I was 11 years old, as a hobbist, programming in AMOS Basic and Blitz Basic for Amiga.
At 12 I had my first try with assembler, but it was too difficult at the time. Then, in the same year, I learned C and, after learning C, I was finally able to learn assembler (for Motorola 680x0).
Not sure, but probably between 12 and 13, I started to learn C++. I always programmed "in an object oriented way", but using function pointers instead of virtual methods.

At 15 I started to learn Pascal at school and to use Delphi. At 16 I started my first internship (using Delphi). At 18 I started to work professionally using C++ and since then I've developed my programming skills as a professional developer in C++ and C#, generally creating libraries that help other developers do they work easier, faster and with less errors.

Now I just started working as a Senior Software Engineer at Microsoft.

Comments and Discussions

first of all, thanks for your beaufitul code. it's quite effective, but when I use a class or struct while I defined myself, I found the efficiency is reduced, even worse, it's worse than the system call(new or delete). And when I use the GetNextWithoutInitializing and DeleteWithoutDestroying, it's high efficiency again.

Are you running the code with all optimizations on and outside Visual Studio?
Without optimizations or when running inside Visual Studio (or with any debugger attached) it can really become quite slow.

In my tests, when fully optimized and outside any debugger, this code is always faster than normal new and delete. Surely that a slow constructor will reduce its performance, but it never gets slower than normal new/delete.
But that may depend on the compiler used too... maybe some compilers are quite optimized for new and delete and so this becomes slower. I don't know.

I just changed jobs (and city, country etc) so I am not really having too much free time.
But I will keep this in mind for a future update.
In fact, I don't know. Do the std::list even call allocate with a different value than 1?

I am not going to change this code (I don't like to change an article after it won in a competition), but I must say that:
1) I agree with using a function to get the _ItemSize.
2) Actually, I prefer block-size as it is the size of the allocated block. The node is composed of a Next node pointer + a memory block. It is only the memory block that changes in size.

For example, in a 32-bit computer, where sizeof(void *) == 4, this means:
if sizeof(T) == 1, the result will be 4.
if sizeof(T) == 2, the result will be 4.
if sizeof(T) == 3, the result will be 4.
if sizeof(T) == 4, the result will be 4.
if sizeof(T) == 5, the result will be 8.

This works because when I divide, I lose precision, then when I multiply, I got the value "rounded".
But if I did only (sizeof(T) / sizeof(void *)) * sizeof(void *), values like 1, 2 and 3 will become 0, when I need it to be 4.
If I used sizeof(T) + sizeof(void *) (without the -1) values like 1, 2 and 3 will be correctly, but the value 4 (for the sizeof(T)) will become 8, when it may stay at 4. This is why I use the -1.

Actually I saw this kind of time many, many times about 20 years ago... now it is something extremely rare.

If I understand correct, when you "delete" object - you put pointer to next/prev deleted object or null, on its place. And this should "overwrite" object data (first 4/8 bytes). So, we have to call constructor each time to ensure that object not corrupted with that pointer in the object's begin. And so we can not "reuse" object.

Do you thought about adding some margin to the object? So when we allocate, we allocate more on this margin size. And when we return/construct object we do this on object_start + margin_size address (if you use "operator new" instead "operator new[]" for allocation, and manually calculate address, after all)?

I am not sure if I understand your question. But I purposely reused the address of a deleted object instead of putting a "header", so I use less memory.
Considering that most objects are usually between 4 and 30 bytes in size, this means that I simply don't add any "size overhead" to the allocation.

Surely the object can't be reused after it is dead, but if you allocate a new object, it will be reused. The "first deleted" pointer will be stored, then such pointer will be changed to point to the next deleted (or null) and so, that one that was stored will be returned.
This means that everytime you delete an object, the next new will use the address of that one that was just deleted.

It really needs to call the constructor, but this is how any object that was deleted (executed the destructor) must be reconstructed.

So, considering that you are allocating on the initial block(of size 32). You allocate:
A, B, C
Then you delete B and C.

A next new will take the place of C.
Another new will take place of B.
Finally, another new have no deleted places to use and will will continue getting items from the first allocated block.

If you want to change something, feel free.
But in this case, the fact the objects must be reinitialized is on purpose. After all, the destructor is invoked.

In fact, another pool may use a different logic. Instead of allocating only memory and waiting to initialize it, it could initialize all objects. Then, it could simply return already initialized objects that are considered as "in use" and the "free" will mark them as not in use, without ever invoking the destructor. Only at the end of the pool, all the destructors will be invoked.

But doing it that way is for a different use case, in which the constructor or destructor is the slow part. Actually this pool makes the new and delete faster by not losing time with excessive memory allocations and it is explained in the article that C# (for example) actually allocates object faster than C++, in normal situations, because the C#/.NET use a pool for allocations while C++ doesn't (but this pool is here to solve that).
And I also commented that I was unsure if I should call it ObjectPool or MemoryPool, as the reused part is the Memory, yet the pool returns objects, not raw memory.

> After all, the destructor is invoked.
I would not suggest you, but you have "DeleteWithoutDestroying" )

> But doing it that way is for a different use case, in which the constructor or destructor is the slow part.
They could be mixed in your implementation too. With templated parmaeters bool construct_all (construct all objects on Node construction) and bool destruct_all (destuct all on Node destruction).

> And I also commented that I was unsure if I should call it ObjectPool or MemoryPool
The only way you can not do with your kind of pool - is real memory deallocation.
For example - you make 2000 objects, than release 1500 (some heavy event occurred). Now you want to free them (need memory for something else). But you can deallocate only by chunks, and it is impossible to deallocate whole pool, because 500 objects still in use. I think this is the only case, when "ObjectPool" is needed. You can really free unused memory with it.

> If you want to change something, feel free.
Of course. That was only suggestion.

Unfortunately this pool is for fixed sized objects only. In fact, this is what gives its O(1) trait.
Imagine that even if I can allocate different sized objects, it will be problematic if I remove one from the middle.
Example: The first block is of size 32.
Allocate 10. Allocate 5. Allocate 10.
If you allocate 30 now, it is easy to allocate a new block... but then, some bytes of this block will be "lost"... or, if I don't want to lose them, you will probably need to search them (which kills the O(1) guarantee... at least if it is a simple solution).

Then, imagine that you delete the block of 5 bytes. A new allocation simply tries to reuse the deleted block... but if you want to allocate 6 bytes, you can't. This wouldn't happen if all allocations were of the same size.
So, this pool will not solve your problem.

I have test your pool on AIX/GCC4.2.4. It's good.There are some results:1、My machine is slow, so I change the NUMBER_OF_ITERATIONS to 100000. The result is 542288ms vs 707116ms.2、There are little thing to change. I hope you can update the pool to a crossplatform one.

There is a study of pool of small objects in 'Modern C++ Design' by Andrei Alexandrescu (Ch. 4). It is dealing with objects same size and a different size. The Loki library implemented the code.
How do you compare your approach with that of the book?

I don't have the book, so I can't compare.
Yet, as I presented it here, it is O(1) for single object allocations and deallocations (while there are some implementations that are O(log n) or even O(n) either for allocations or for deallocations).

So, I think that if you have something built based on that book, you can try and see the performance difference. I actually can't see many performance improvements, but I can see many "control" improvements, as this pool doesn't call the destructor of its objects if the pool itself is deleted before deleting each object.

No. I will not try to borrow it. I presented an O(1) solution that works pretty fine. I can see other solutions and inspire myself, this doesn't make my solution useless... I am pretty sure that any "variable size" solution isn't capable of being O(1) and at the same time never "losing" memory.
I don't need to read documents to know the obvious.

Additionally you should consider using C++ naming guidelines. I don't want to be to pedantic here, but I find it important to follow the common guidelines (I know - since C++ is old and everyone has made up his own kind of styling rules it is a mess, however, there are official guidelines from the committee): e.g. Put the asterisk directly after the type, use camelCase for your methods, use Egyptian brackets and consider using the STL more.

In your code you re-invented the wheel. Why create a linked list when there is std::list<T>? But the last one is a lot more flexible, better tested and probably performing better. Additionally you did not implement a move constructor.

Using malloc in a C++ code is legit, but might end up in a disaster. After all the memory management is a little bit different (there is more behind new than just the additional call to a constructor method)... Generally it is good practice to avoid mixing C and C++ calls (of course sometimes you have to do it).

Finally I encourage you to use auto more often. Gives you the right type without guessing.

One last thing: For having reference counting memory management you could also rely on e.g. shared_ptr. It is fast, efficient and well-tested. If you really want to have a look on how to implement a more advanced GC then I recommend https://www.cs.princeton.edu/~appel/modern/c/software/boehm/[^] - it is a classic that helped me already a lot.

I actually agree with many things you said but:
* I actually continue to use C++ compilers that don't have the nullptr. I know, I could use a #define for it, yet in every place that I worked that used C++ they were still using NULL, so I don't have a problem with it.
Also, even if your example of foo(int)/foo(void* a) is correct, I am not using NULL in a situation that generates such ambiguity.

About the naming guidelines... I completely agree with you. But I actually used Borland C++ Builder for many time and also C#... actually C++ Builder naming conventions are pretty similar to .NET ones, so I used a C++ convention, even if it is not the standard one.

About reinventing the wheel, I must completely disagree. Note that I use the same buffer used for object allocations as a linked list. So I don't use extra memory. Items smaller than a pointer will actually be allocated with the size of a pointer because when I delete them they become the "node" of the linked list. So, reinventing the wheel was completely on purpose.

About the malloc, I am already considering using an extra template parameter for the allocation/deallocation, so users will be free to use malloc/free, new char[...]/delete[] or anything they like.

About auto... if I used it, it was actually an error. I didn't want to use auto because older compilers don't support it. I agree that it uses the right type, but I really want a portable code.

And for the GC, shared_ptr actually covers only reference-counted situations and the idea is to implement a garbage collector where A can have a strong reference to B and B have a strong reference to A. shared_ptr, as far as I know, doesn't allow this.

Edit:
About the move constructor, I should actually make the default constructor explicit and forbid the copy constructor and the equals operator, as I don't want a pool to be created if an int is given as parameter or to have implicit copies.

What compilers are you targeting ? The features nullptr and auto are available since a couple of years in the industry's leading compilers (even before the std=c++11 flag, e.g. icc had it with std=c++0x - supported since 2010).

If you really want to target an older standard then this is okay - but I can just ask you: Would you (with the same argument: portability) also leave out e.g. async / await in C#? If your project target type does not support .NET 4 then this is reasonable, otherwise there is no reason for such a constraint as this is only a compiler feature.

What I'm going at is - if you want real portability then just go for C. You can even run it on microcontrollers and you REALLY just get what you write. There are no hidden costs e.g. for vtbl, vptr etc.

Sorry for my comment about reinventing the wheel - I did not realize you had such a strong goal in mind. Have you checked if this really makes a huge difference? Would be interesting to see. Also: You should consider checking against std::vector<T>. Even though it will perform totally worse in only push / pop like operations, it will totally outperform a linked list once there are many traversals /iterations included.

And final remark: Yes shared_ptr is reference counted - which is why I remarked it. But I love the simplicity and it works great in the situations where my code does not define a point at which some allocation can be deleted.