Loading files into memory

This is a discussion on Loading files into memory within the C++ Programming forums, part of the General Programming Boards category; Originally Posted by C_ntua
To see if I understand this correctly. Say you have:
Code:
struct s {
int x;
...

Then the compiler will want everything to be a multiplication of 4 bytes? Thus pad the memory with 2 bytes after the short?

Typically the compiler will strive to align on DWORD (4 bytes) boundaries or on boundaries that matches the biggest variable size (in this case, double) in the struct.
However, all padding is both compiler and platform dependant.

Can I assume that it is faster by padding the bytes so it can think of the struct in a more "array" way, so knowing the address of the pointer of the struct, find more quickly the values of each member of the struct?

Not really, since the offsets are known at compile time, there's no runtime overhead for accessing each member, whether they're padded or not.

On some architectures, trying to read unaligned memory can even trigger a hardware exception.

Originally Posted by Elysia

Even on the case of x86 and x86-64, it can cause processor penalties (speed hits!) to read unaligned memory...

Originally Posted by matsp

Aside from CornedBee's correct statement that on some machines, reading unaligned memory [1] can cause crashes. Even if it doesn't cause a crash, it is slower to read memory from an unaligned address (because the processor has to collect data from two separate reads and put it together into one item).

Typically the compiler will strive to align on DWORD (4 bytes) boundaries or on boundaries that matches the biggest variable size (in this case, double) in the struct.

I would rephrase that and say that it strives to align all items to their natural alignment. That is 4 bytes for DWORD and int, 2 bytes for WORD or short, 1 byte for char, 2 or 4 bytes for a wchar_t (2 bytes in Windows, 4 bytes in Linux), 8 bytes for a long long (or long in Linux on a 64-bit OS), 8 bytes for double.

This is because the data-lines going into the processor are lined up in a way that the 32 bits for a 32-bit integer comes in on certain lines. If it's unaligned, the processor will have to "double step" to get the data in, first read one portion, then jump to the next portion.

However, all padding is both compiler and platform dependant.

Indeed. And the penalty for getting the padding "wrong" varies from a one extra clock-cycle per access, then a few dozen or hundreds extra for a "unaligned access trap", all the way to "program crashes". I know OS's that only allow unaligned access in user-mode, so kernel mode unaligned access will lead to an Kernel OOPS, BSOD or whatever the corresponding is on the relevant OS.

And CornedBee makes another good point that some instructions are REQUIRED to be aligned even if generally data CAN be unaligned at some performance penalty.

I would rephrase that and say that it strives to align all items to their natural alignment. That is 4 bytes for DWORD and int, 2 bytes for WORD or short, 1 byte for char, 2 or 4 bytes for a wchar_t (2 bytes in Windows, 4 bytes in Linux), 8 bytes for a long long (or long in Linux on a 64-bit OS), 8 bytes for double.

Another example of instructions that require alignment: SSE. SSE loads (except for special "unaligned load" instructions) must be aligned to 16-byte boundaries.

Yes, very true. And with the above mentioned exception in mind, there is only two options: Make sure the data IS aligned, or read the data into a register using the special "unaligned" instructions (and they ARE slower in most machines, even if the data IS aligned). For such operations, you do not only need alignment to the boundary of the basic data, but a bigger alignment (because the basic data is multiple elements, but the WHOLE block of for example 4 floats need to be 16-byte aligned).