Guaranteeing Alignment

Terminology

Review the concepts document if you are
not already familiar with it. Remember that block is a contiguous
section of memory, which is partitioned or segregated
into fixed-size chunks. These chunks are what are
allocated and deallocated by the user.

Overview

Each Pool has a single free list that can
extend over a number of memory blocks. Thus, Pool
also has a linked list of allocated memory blocks. Each memory block, by
default, is allocated using new[], and all memory
blocks are freed on destruction. It is the use of new[] that allows us to guarantee alignment.

Proof of Concept: Guaranteeing Alignment

Each block of memory is allocated as a POD type (specifically, an array
of characters) through operator new[]. Let
POD_size be the number of characters allocated.

Predicate 1: Arrays may not have padding

This follows from the following quote:

[5.3.3/2] (Expressions::Unary expressions::Sizeof) "... When applied to
an array, the result is the total number of bytes in the array. This implies
that the size of an array of n elements is n
times the size of an element."

Therefore, arrays cannot contain padding, though the elements within the
arrays may contain padding.

Predicate 2: Any block of memory allocated as an array of characters
through operator new[] (hereafter referred to as
the block) is properly aligned for any object of that size or
smaller

This follows from:

[3.7.3.1/2] (Basic concepts::Storage duration::Dynamic storage
duration::Allocation functions) "... The pointer returned shall be
suitably aligned so that it can be converted to a pointer of any complete
object type and then used to access the object or array in the storage
allocated ..."

[5.3.4/10] (Expressions::Unary expressions::New) "... For arrays of
char and unsigned char,
the difference between the result of the
new-expression and the address returned by the allocation
function shall be an integral multiple of the most stringent alignment
requirement (3.9) of any object type whose size is no greater than the
size of the array being created. [Note: Because allocation
functions are assumed to return pointers to storage that is appropriately
aligned for objects of any type, this constraint on array allocation
overhead permits the common idiom of allocating character arrays into
which objects of other types will later be placed. ]"

Consider: imaginary object type Element of a size which is a
multiple of some actual object size; assume sizeof(Element) > POD_size

Note that an object of that size can exist. One object of that
size is an array of the "actual" objects.

Note that the block is properly aligned for an Element. This directly
follows from Predicate 2.

Corollary 1: The block is properly aligned for an array of Elements

This follows from Predicates 1 and 2, and the following quote:

[3.9/9] (Basic concepts::Types) "An object type is a (possibly
cv-qualified) type that is not a function type, not a reference type, and
not a void type." (Specifically, array types are object types.)

Corollary 2: For any pointer p and integer i, if p is
properly aligned for the type it points to, then p + i (when well-defined)
is properly aligned for that type; in other words, if an array is properly
aligned, then each element in that array is properly aligned

There are no quotes from the Standard to directly support this argument,
but it fits the common conception of the meaning of "alignment".

Note that the conditions for p + i being well-defined are outlined in
[5.7/5]. We do not quote that here, but only make note that it is
well-defined if p and p + i both point into or one past the same array.

Let: sizeof(Element) be the least common multiple of sizes of several
actual objects (T1, T2, T3, ...)

Let: block be a pointer to the memory block, pe be
(Element *) block, and pn be (Tn *) block

Corollary 3: For each integer i, such that pe + i is
well-defined, then for each n, there exists some integer
jn such that pn + jn is
well-defined and refers to the same memory address as pe + i

This follows naturally, since the memory block is an array of Elements,
and for each n, sizeof(Element) % sizeof(Tn) == 0; thus, the
boundary of each element in the array of Elements is also a boundary of each
element in each array of Tn.

Theorem: For each integer i, such that pe + i is well-defined,
that address (pe + i) is properly aligned for each type Tn

Since pe + i is well-defined, then by Corollary 3, pn + jn
is well-defined. It is properly aligned from Predicate 2 and Corollaries 1
and 2.

Use of the Theorem

The proof above covers alignment requirements for cutting chunks out of a
block. The implementation uses actual object sizes of:

The requested object size (requested_size); this is the size of chunks
requested by the user

void * (pointer to void); this is because we interleave our free list
through the chunks

size_type; this is because we store the size of the next block within
each memory block

Each block also contains a pointer to the next block; but that is stored
as a pointer to void and cast when necessary, to simplify alignment
requirements to the three types above.

Therefore, alloc_size is defined to be the lcm
of the sizes of the three types above.

A Look at the Memory Block

Each memory block consists of three main sections. The first section is
the part that chunks are cut out of, and contains the interleaved free list.
The second section is the pointer to the next block, and the third section
is the size of the next block.

Each of these sections may contain padding as necessary to guarantee
alignment for each of the next sections. The size of the first section is
number_of_chunks * lcm(requested_size, sizeof(void *), sizeof(size_type));
the size of the second section is lcm(sizeof(void *), sizeof(size_type); and
the size of the third section is sizeof(size_type).

Finally, here is a convoluted example where the requested_size is 7,
sizeof(void *) == 3, and sizeof(size_type) == 5, showing how the least
common multiple guarantees alignment requirements even in the oddest of
circumstances:

Memory block containing 2 chunks, showing overlying array structures

Sections

size_type alignment

void * alignment

requested_size alignment

Memory not belonging to process

Chunks section (210 bytes)

(5 bytes)

Interleaved free list pointer for Chunk 1 (15 bytes; 3 used)

Chunk 1 (105 bytes; 7 used)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

Interleaved free list pointer for Chunk 2 (15 bytes; 3 used)

Chunk 2 (105 bytes; 7 used)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15
bytes)

(5 bytes)

(5 bytes)

(5 bytes)

(15 bytes)

(5 bytes)

(5 bytes)

Pointer to next Block (15 bytes; 3 used)

(5 bytes)

Pointer to next Block (15 bytes; 3 used)

(5 bytes)

(5 bytes)

Size of next
Block (5 bytes; 5 used)

Size of next
Block (5 bytes; 5 used)

Memory not belonging to process

How Contiguous Chunks are Handled

The theorem above guarantees all alignment requirements for allocating
chunks and also implementation details such as the interleaved free list.
However, it does so by adding padding when necessary; therefore, we have to
treat allocations of contiguous chunks in a different way.

Using array arguments similar to the above, we can translate any request
for contiguous memory for n objects of requested_size into a
request for m contiguous chunks. m is simply ceil(n *
requested_size / alloc_size), where alloc_size is the actual size of the
chunks. To illustrate:

Then, when the user deallocates the contiguous memory, we can split it up
into chunks again.

Note that the implementation provided for allocating contiguous chunks
uses a linear instead of quadratic algorithm. This means that it
may not find contiguous free chunks if the free list is not
ordered. Thus, it is recommended to always use an ordered free list when
dealing with contiguous allocation of chunks. (In the example above, if
Chunk 1 pointed to Chunk 3 pointed to Chunk 2 pointed to Chunk 4, instead of
being in order, the contiguous allocation algorithm would have failed to
find any of the contiguous chunks).