C Specific question

I think I am doing this right but I have a question about something in C specifically.

Does the malloc() function have a void* internally that it uses to point to the memory allocated along with an unsigned int (or whatever) to remember the length of the allocated memory?

I'm wondering, how does it know how much memory to free? The only thing I can figure is it finds the pointer in its list or whatever structure it uses and references it to know how much memory to deallocate.

The reason I'm doing it like that isn't important. My concern is that all the memory is being deallocated when I'm done. I'm unsure because although I had to tell it how much to allocate, I don't have to tell it how much to deallocate.

I believe malloc creates a struct in the memory right before the first address of your newly allocated memory. The struct contains the size of the allocation but no pointer since it knows where it is in relation to the structs memory location. So no, you don't need to specify size to free(), and there's no need to dereference your pointer in the free() call.

You can't do foo*** bar = malloc(x*y*sizeof(void*)) and then treat it like a two dimensional array of foo*.

But I guess you already figured that out because it crashed. Rule of thumb is that every indirection (or dimension, or "*") needs its own allocation and initialization, if you want to create a multidimensional array. So in this case, you'd also need to alloc an addition array of size x*sizeof(void*) or y*sizeof(void*), and init them correctly to point to the rows/columns of the big malloc(), before you can do foo[i][j] = baz.

malloc_size() isn't a standard function, so don't expect it to work on all systems. The actual implementation of malloc() and free() is implementation defined, so there's no one right way to do it. There are a lot of rules it uses internally to make sure you get memory properly aligned for all major data types for your CPU, how it works with virtual memory, etc. As a programmer, though, you shouldn't worry about that, since the OS already handles all that for you. All you have to do is make sure that each malloc() is accompanied by exactly one free(), and you must pass in the exact same pointer as was returned from malloc().

Completely off topic, but I couldn't help that you just hit post 1234, Skorche.

If each column has the same number of rows, for the most effective use of memory it would be even better to allocate a flat array of size numCols*numRows and index it by [i*numRows + j], since it only requires one allocation. However, that's an optimization and not strictly necessary.

It's an optimization because you only do one allocation for the entire array instead of 1+numCols allocations (and later, a single free instead of 1+numCols) and also because with an array of arrays you need a second indirection (and a memory access) to get to your object whereas with a single array you skip that step.

Anyway, if you are happy with your structure and find it easier to use, go with it, the performance difference would likely be negligible since any Obj-C call would likely overshadow whatever cost a double indirection may incur. Personally, I wouldn't use an array of arrays to represent a 2D array, I'd only use it if I needed each column/row to be able to contain a different number of rows/columns.

As PowerMacX, two reasons are requiring fewer allocations and one less dereference. Having a contiguous array also helps with cache performance, since you will have fewer cache misses. Another issue is malloc will allocate more memory than you request, since it needs to store information about the allocation and it may also allocate extra memory as padding.

None of these are critical issues, and you should make your code easier to understand before you make sacrifices to optimize for them. However, making a single array and indexing it by i*numRows + j is a relatively simple thing to do, and it has other advantages such as less effort to manage the memory (since you only need to manage 1 malloced pointer) and you only need 1 loop to visit all the elements.

To get back to the original question - how malloc() works internally is pretty implementation-dependent. Some compilers do it differently than others. However, I don't know any that use a struct adjacent to the allocated block, which is quite error-prone in the case of buffer overruns or underruns. Most implementations that I know of use something like a hash table of adresses against a small struct containing allocation size and so on, which is usually placed away from user addresses.

If you're interested, have a look for a code package called dlmalloc, which is a good fast malloc implementation sometimes used in games.