I use an aligned malloc to allocate an array of struct test_element, aligned on a 16 byte boundary.

When I run the programme and examine the first element of that array, I see that in the struct lfds700_freelist_element, next is 16 byte aligned, user_data is 16 byte aligned (it follows with no padding), but tail_padding then ALSO follows with no padding - e.g. is 8 byte aligned. This then means the next member of the following struct lfds700_stack_element is 8 byte aligned and I crash, due to memory mis-alignment when performing cmpxchg16b.

If I have 16 byte packing, how can ANY member of a struct be on an 8 byte boundary?

Now, I expected structs to be tail padded so that they could be array allocated and I was crashing, so I added the tail_padding members; I think in fact these are unnecessary and tail padding porbably does occur - but I have left them in, because they should guarantee the extension of the struct to the 16 byte boundary and they don't do any harm with regard to the question itself.

So, it seems to me the freelist user_data member is not being padded by eight bytes when it should be. What am I missing? is there some behaviour due to the use of the structures inside a parent structure?

The MSDN documentation says, for a given value n, that "The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller."

I've underlined the bit you've overlooked.

Unless you're trying to tweak performance in some very specific way, changing packing of structs is usually pointless anyway.

Right 98% of the time, and don't care about the other 3%.

If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

The MSDN documentation says, for a given value n, that "The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller."

Yes. The void pointer member is eight bytes in length and has been aligned on what is an 8 and 16 byte compliant boundary.

The tail_padding array is 16 bytes in length, but has been aligned on an 8 byte boundary. I presume by multiple the author means positive integer multiple! I wonder if the alignment is treating the array not by its size, but by the size of one of its elements?

I'm not sure I comprehend the documentation anyway, for it would mean if you requested say 16 byte packing and had a 4 byte member, the compiler could put it anywhere on a 4 byte boundary - your packing instruction would be useless. How would you in fact achieve packing at all then? this makes it seem the packing instruction specifies a *maximum* packing size.

When tail_padding is not present, what I find then is that the next member of the stack_element struct is on an 8 byte boundary.

This also seems strange and wrong. The struct is 16 byte packed, the freelist struct is 16 byte packed, the parent struct is 16 byte packed and the aligned allocator is on 16 byte boundaries. It looks as if 8 byte of tail padding is missing from the end of the freelist_element structure.