I use C++ as a hobby and .NET professionally so I can code in C++ as my hobby in game development but I don't understand 100% the insides and out of memory. I think I have a decent understanding but I want to ask some questions and give how I think things work (over reading forums and other thing about it) and maybe someone could be as kind as to tell me if I'm mistaken.

1) Memory created on the stack is created faster and accessed faster (therefor runs faster) than memory created on the heap.

2) There is only so many Meg's of stack memory per application, but heap can grow and shrink.

3) stl containers actually store the objects they contain on the heap. it doesn't matter if the type you have is a pointer or not. ie (list<Object> vs list<Object*>). If I'm filling this list myself ALL objects will be created on the heap meaning the speed penalty between the stack/heap exists.

4) Normal arrays that are not storing pointers ie (Object obj[10]) are stored on the stack no matter what.

5) I thought I heard that if the object you are inside was created on the heap then the variables that it creates are also stored on the heap even if they aren't pointers ie:

class Object
{
};
class Entity
{
private:
Object o;
};
Entity* e = new Entity();
//e->o is actually created on the heap instead of the stack because it's parent object (Entity) was created on the heap

6) So given that above isn't that far off, that means to truly squeeze performance of both loading times and speed, one should try to avoid the heap if possible. The problem with that being that you only have so much space on the stack (maybe 64-bit eases this pain a lot?) and that you have to know how many objects you'd need. My question about knowing is that very few games actually have unlimited amounts of any objects. So you could store your objects in normal arrays that would be predefined/max size and have code for reusing "old"/"deleted" objects. This would mean you are most likely wasting memory but at the benefit of speed. Is that trade-off ever worth it games? I mean a benefit to this is that you don't have to worry about memory management or leaks. I understand most parts of a game (audio/visual) will use the heap and has too, but I'm more referring to the gameplay code (the code I'm actually writing because I use an engine for all the other stuff) instead of the resource management.

1) This is partially correct due to cache effects (the stack is small and accessed frequently and thus more likely to be in cache, while heap allocated items are scattered in memory), however the cache effects are irrelevant because thats not what you base your decision between stack/heap. If you CAN allocate on the stack, you allocate on the stack. What makes the heap slow is that finding a free segment of memory is an extremely expensive operation, which is why you only want to allocate large blocks and prefer more "uniform" allocation strategies (stacks/pool allocators)

2) The stack is limited in most cases, which is why you usually allocate on the heap if you need a huge chunk of memory (especially if the cost of allocation is overwhelmed by the processing of that huge memory chunk). You can usually change the size of the memory chunk reserved for stack though.

3) Dynamic STL containers use the heap to store the objects because they are dynamic, you cant really resize memory allocated on the stack. A std::vector will allocate a contiguous chunk of memory from the heap where it stores your array of objects (and reallocates if needed). If the type is Object, the array contains Objects, if the type is Object*, the array contains Object pointers (however you need to make sure the pointers actually point somewhere!). The containers dont really care what the type is, they just blindly store them. If you were to separately allocate these Objects (to fill the Object* container) on the heap, they would be scattered all around the memory instead of being stored in a contiguous chunk, which is very bad for cache.

4) Normal arrays are stored on the stack, because their size is known at compile time and thus they can be. std::array also allocates on stack because its size is known at compile time, so not all std containers store their elements on the heap.

5) Classes and structures in memory are simply their members stored in a single chunk of memory. So, in your example, Entity in memory is exactly the same as Object in memory. Thus, when you allocate an Entity on the heap, Object must be on the heap too (because its integrated as a part of the memory representation of Entity)

Eg. if we had struct RGB {char r,g,b;}, in memory we would have

[ RGB ]

which is the same as

[char r][char g][char b]

which is the same as

[imagine 24 bits here]

A pointer would simply be a number (which you may interpret to be a Object*) so if Entity had Object* instead, the bytes of the pointer would be in heap, but the Object could be anywhere in memory (heap,stack,wherever you put it). That is, if Entity has Object*, it doesnt mean Object* points to the heap. You can point to an Object stored on the stack, too.

6) Theres nothing wrong with the heap itself, its the allocation that is the problem. If you were to use a fixed size array (and reusing spaces for old objects) you are basically implementing a custom allocator which is most likely more efficient than the general allocator provided by the OS. You can combine these two, use the general allocator to get the large blocks of memory, and then use custom allocators to efficiently allocate your objects within these memory blocks. This way you get both fast allocation as well as nonlimited space (if you allow allocating more of these big memory chunks using the general allocator as needed)

1) Memory created on the stack is created faster and accessed faster (therefor runs faster) than memory created on the heap.

Stack space is allocated faster than a general heap allocation. It's just a pointer increment. Heap allocation and deallocation has other datastructures to manage, and threading to worry about.

But all your remaining questions are revolving around some misconception about how "fast" this memory is. There's NO speed difference in using any particular byte of stack or heap memory, it's all the same physical ram on the machine. The speed differences come from how you use the memory. Stack memory is "fast" in that you usually just pushed onto the stack the object you're playing with. That means the compiler will either optimize it into a register, or if it does spill into main memory it's going to be in the cache. Heap space is only "slow" because you can technically go grab any old byte that you want, and if it isn't in the cache that takes time. Pick the correct datastructures for your objects, and you can optimize for memory locality, which is the biggest time saver. Keep all the data you need to access near related data, and push all the unrelated data somewhere else. There's really big performance benifits to organizing your data into contiguous runs of information that you fully access instead of letting the "interesting" bytes get spaced out across memory.

That said, to your STL questions, it's generally not about how it uses the ram, it's about the "Big O" complexity of the algorithm that you want. Containers are all about defining the insert, delete, find, and iterate complexity. Some datastructures space out your object in the heap more than others, but give you some complexity benefit to your algorithm. If you really care about how it's spacing things out, you can, with some work, replace the allocator to give the behavior you're looking for. But you usually want to just consider the containers on the merits of "i need fast, sorted inserts" or "i need fast linear iteration", and store only what you need in each container so that you're only paying for the "interesting" bytes that you wanted to store.

But there is a speed difference in allocation right, meaning loading times would be faster if everything that could would be stored on the stack vs heap? Again though because most times we don't know how many objects we would sacrifice memory by pre-allocating everything on the stack, but I would think loading would be greatly increased. We all hate loading times right

I remember I was testing a 2D tile game where I would create Tile objects dynamically for each element in the 2D array (trees, etc). When I was using new Tile on this 2000x2000 2D array it took like 10 mins or something like that to finish (I don't remember the exact numbers but it was a long time). When I switched to stack memory for this it was a couple seconds. So the difference between Tile* map[2000][2000] vs Tile map[2000][2000] was massive. Of course that changes how I was going to code things because I was using polymorphism for the Tile objects and couldn't when I changed to not using a pointer for the objects.

10 minutes sounds like an outrageously large amount of time just to perform 4 million allocations, even on the heap. I just did a test to do four million 32 byte allocations and it only took about 2 seconds.

Anyway: don't put all your stuff on the stack. The stack is useful for objects that are relatively small and who are expected to live only as long as the scope they're in. It can be useful to use the stack for these objects rather than making a slower heap allocation. But for long lived, large objects (like a massive array of tiles) you'll do best putting it on the heap.

And since heap allocation is slow, it's important to think about your memory allocations and not make tons of allocations wastefully. Making four million allocations for your tile map is wasteful, in terms of the time it takes to perform those allocations as well as the overhead for each allocation. You'd be better off making one big allocation that contains the entire array of tiles, and using some other technique than virtual functions for polymorphism (i.e. switching behavior based on a type field in the tile, or something).

You'd be better off making one big allocation that contains the entire array of tiles, and using some other technique than virtual functions for polymorphism (i.e. switching behavior based on a type field in the tile, or something).

The stack pointer is in a register. If you allocate something on the stack, it's assigned an offset. There is no actual runtime allocation of memory that incurs a cost, just the address calculation done at compile time. So when you use map[x][y] from the stack, the compiler replaces that with something like stack register + offset to map + x * sizeof(Tile) + y * sizeof(Tile) * 2000. That numerical result is computed at compile time, so there is no runtime cost to using stack memory.

Even your "Tile* map[2000][2000]" line of code is not using the heap. That is placing 4 million Tile pointers on the stack. When you call new to allocate space for one of the elements of the map 2D array, that is the first heap memory you're using. For performance, you want to minimize the number of times you call new (or malloc, etc.). The new operation is very slow, slow enough that many, many games reinvent the wheel and write their own memory allocation functions.

In your situation, I'd probably do:

const int numTilesX = 2000;

const int numTilesY = 2000;

Tile* map = new Tile[numTilesX * numTilesY];

and access any element as map[x + y * numTilesX]

I use single-dimensional arrays as often as possible. I just find them easier to debug, and I'm a bad enough programmer that I spend a lot more time debugging than writing new code =)

If it helps, think of it in terms of the struct keyword vs class keyword in C#. Any local variable or class member in C++ automatically acts like a C# struct. If you explicitly allocate on the heap (it'll never happen automatically, except when some class you use is doing it internally, like most STL containers) then it's like a C# class. Likewise, the performance in C++ is not too far off from C#: don't allocate things you don't need to allocate. C# makes it harder to avoid allocations compared to C++ in exchange for making it easier deal with. You really need a clear understanding of the stack and heap for high-end C# programming, too, so you might want to take the time to read some texts on the topic.

Hi there. I don't think it's that slow to allocate memory. My last 2d rts game I allocated all objects with new and deleted them and I had no slow downs or any thing you could notice. And that could have 8 players with 800 units each plus.

Hi there. I don't think it's that slow to allocate memory. My last 2d rts game I allocated all objects with new and deleted them and I had no slow downs or any thing you could notice. And that could have 8 players with 800 units each plus.

If you mean allocating each unit separately, it could make a huge difference if you used pool allocation instead when it comes to creating and destroying units, as well as processing them (due to cache being utilized better)

The creation and destruction of units probably isnt a problem here though because i doubt you would be doing thousands of allocations per second. Its more the processing part that would matter here.

So allocation isnt horribly slow itself, but compared to the alternatives it is.

As an example i had implemented my sparse quadtree using std::function (so i can use lambdas to do operations on it) and std::vector, and when i replaced std::function with a template parameter and applied some .reserve() to the vectors, the performance improved at least tenfold.

I think we need to stop saying, “heap allocation” and “stack allocation”. There is just “allocation”, and it is always on the heap and it is always preferable to avoid for performance reasons.

The stack is nothing more than a piece of memory that was allocated (from the heap) by the operating system when it created the thread (and each thread has its own stack).

On a hardware level, unless you are working on Nintendo DS, there is absolutely no difference what-so-ever between stack memory and heap memory.
You can literally allocate from the heap, copy the thread’s stack memory over to the heap address, change the EBP/ESP pointers to the relative address on the allocated RAM, and the performance will be exactly the same (accounting for cache coherency and every single other factor—literally because this is all the operating system does when it creates a thread).

In a typical case the stack is accessed frequently enough to constantly be in cache, but accessing an array of 2,000 integers on the heap vs. on the stack will incur essentially the same number of cache misses and thus have the same performance.

In other words, the lesson to be learned is that any benefit the stack has in performance can easily transfer over to heap RAM as well. You can use allocated RAM (AKA “the heap”) just as efficiently as stack RAM, so just by moving data to the stack in itself isn’t going to gain you any amount of performance. It’s how you avoid allocations in the first place combined with how you take advantage of the cache and forward-predict.

So when you use map[x][y] from the stack, the compiler replaces that with something like stack register + offset to map + x * sizeof(Tile) + y * sizeof(Tile) * 2000. That numerical result is computed at compile time, so there is no runtime cost to using stack memory.

Wrong.
If you have a hard-coded array on the stack and then you access it with hard-coded indices, such as:

When it comes to actual dynamic array access (iMyLife[Y][X]), the x86 instruction set is itself very dynamic.
The MOV instruction can be written as:

mov edx, dword ptr [ebp+eax*4+100]

A single instruction can contain the stack base offset (EBP), an array index (EAX), size multiplier (4) and even an offset (100).
But what you are proposing is far beyond the x86 instruction set. It would look something like (assuming an offset of 100 again, sizeof(Tile) = 4, 2,000 tiles in a row, eax = X, ebx = y):

mov edx, dword ptr [ebp+ebx*8000+eax*4+100]

This is obviously not a valid instruction.
#1: The size multiplier can be only 0, 1, 2, 4, or 8. Not 2,000, not 8,000, not even 20.
#2: No mnemonic allows 3 registers in a single operand; you can’t have ebp, ebx, and eax all being arguments for the right side.

Whether on the stack or the heap, even when the compiler knows the stride of the array (2,000 × sizeof( Tile )), the output code will be very similar.

The basic bottom line is not to assume the stack is faster than the heap. Use the stack how it is supposed to be used (for small local variables) and use the heap how it is supposed to be used (for large data, data of changing or unknown sizes, etc.) and learn general proper performance practices.

The basic bottom line is not to assume the stack is faster than the heap. Use the stack how it is supposed to be used (for small local variables) and use the heap how it is supposed to be used (for large data, data of changing or unknown sizes, etc.) and learn general proper performance practices.

This alone would sum up the point pretty well, even ignoring the entire rest of the information of the post.

If anything a more realistic point to make would be that there is almost never going to be a situation where you are suffering performance wise simply because you made an allocation on the heap vs the stack, if anything the problem is usually that you're making memory allocations on the heap in an extremely inefficient way.