Aligned memory allocation

In some scenarios, you want to get memory that is aligned at an address that is a certain power of 2. Certain CPU architectures and certain operations require (or are faster) if their operands are located at an address that is a multiple of a certain power-of-2 number. For these reasons, you might see that many multi-platform libraries use an aligned memory allocator instead of malloc in their code. For example, OpenCV uses methods named fastMalloc and fastFree inside its code that do this type of allocation and freeing.

Most of these methods work like this:

They internally get memory from malloc. However, if you requested for N bytes, the wrapper will request for N+P+A bytes from malloc. Here, P is the size of a pointer on that CPU architecture and A is the alignment required, expressed in power-of-2 number of bytes. For example, if I request for 100 bytes on a 64-bit CPU and require the memory to be aligned to a multiple of 32, then the wrapper will request for 140 bytes.

After getting the memory from malloc, it aligns the pointer forward so that (1) the pointer is at an address that is aligned as per requirement and (2) there is space behind the pointer to store a memory address.

Then we sneak and store the address actually returned by malloc behind the pointer address and return the pointer to the user.

The user has to use our free wrapper to free this pointer. When she does that we sneak back to reveal the actual address returned by malloc and free using that.