The workaround is so simple that it sounds like a good idea to always use it, preemptively.

No, no, no, no, no, no, no.

Never prematurely optimize. Sure the compiler is stealing 8 bytes, but in most cases this is completely insignificant. Only when you are instantiating large numbers of these classes, or doing loop-intensive operations over them where cache lines might come into play and it's significantly affecting application performance should you ever do this optimization.

In any other case it is just creating a more esoteric code base, optimizing around a specific compiler implementation that is likely to change. You're relying on non-standard behavior that could likely change in the next release or service pack. If you need this to make your game work well then fine, but I think your time is better spent on different optimizations in most other circumstances.

This buggy behaviour has been in all Microsoft compiler ABIs for 22 years. What makes you believe the implementation is “likely to change”? Do you honestly believe Microsoft can change its x86_64 C++ ABI in a… service pack?

But you are right in a sense, “always use it” is about as stubborn a thing to say as “never prematurely optimize”. I will find a better wording.

You have a decent point, if it's been that way for 22 years it is unlikely to change.

On the other hand the ABI is not a standard. That's why we have things like COM, to create standards for binary interface because Microsoft explicitly makes no guarantees on the stability of the C++ ABI, and has repeatedly changed it in the past.

I think saying "never prematurely optimize" is a 100% correct thing to say. One should always strive for functional correctness first. The trick is knowing when the right time to optimize is, which is rarely apparent, and usually determined from benchmarking.

Microsoft does guarantee the stability of the C++ ABI and it has remained stable since Windows 95.

The context of this post is not general purpose development, but rather is a blog post specifically geared towards developers of video games, particularly console games like on the XBox and PS3.

In those contexts optimizations such as these matter, especially considering those platforms have hard requirements and strict reviews that games have to adhere to in order to get approval from Microsoft or Sony.

Sure if you're programming some plain accounting app, who cares about 8 bytes. But if you're programming a real time 3D renderer with a crap load of objects that need to be rendered at least 29 frames a second, yeah 8 bytes per object going to absolute waste on a system with only 512 MB of RAM that is shared between the CPU and GPU.

C language says absolutely nothing about crucial elements of a system ABI. For example, C language does not define any alignment, and does not define any calling convention. That alone makes it completely impossible to define a system call using C. It's not a "part" that is missing in C, it's big huge gaping fundamentals.

With that in mind, saying that the system has a C ABI is, quite frankly, ridiculous: a conforming C implementation has no guarantee whatsoever to actually work on any platform, because, even if it adheres to the standard to the letter, it needs not match what the platform does, making the implementation useless.

That is one thing. Second is: a lot of languages (including using hex editor to make platform-conforming executable, i.e. no language whatsoever) are perfectly capable calling into many systems that you would define as having a "C ABI", and that makes the system ABI language agnostic.

Finally, I also know why people have this notion about a system ABI. It's because they don't know any better (or they do, but have tunnel vision): they think, "Gee, this system code is written in C (almost always the case, for a huge part), and I simply do #include <system\_calls.h> to call it from my C code, therefore, a system has C interface" (or ABI, or whatever).

While reasonable on the face of it, this is... Well, pretty silly. It's not any language that defines the platform, it's exactly the other way around. It's the platform that will decide on alignment: e.g. some platforms can't address 2/4-byte datums on impair/pair addresses, so the C (and any other) compiler must align such accordingly. It's the platform that will decide on calling convention: e.g. some hardware platforms have hefty numbers of registers, and for those, things like __cdecl are bad for performance, and registers are used a lot instead. And so on.

I think our disagreement comes from you using "C ABI" in a non-standard way. The conventional meaning (see for example ARM C++ ABI, Itanium C++ ABI) is a set of rules C and C++ compilers must follow so that the binaries they produce are compatible with each other. These details, as we've both been saying, depend on the platform because the standard doesn't say anything about them (or, in many cases, even need them to exist).

You seem to be interpreting the phrase as applying to an entire platform whose ABI might be a "C ABI", or perhaps even a "Pascal ABI". I agree this idea is fallacious. Fortunately, it's not what we meant.

I think our disagreement comes from you using "C ABI" in a non-standard way.

I think that "conventional" is a better word than "standard" here.

The conventional meaning (see for example ARM C++ ABI, Itanium C++ ABI) is a set of rules C and C++ compilers must follow so that the binaries they produce are compatible with each other.

Yes, this meaning works, but exclusively on a given platform (see how platform name is the first word? This C++ ABI, that C++ ABI). And even so, there's pretty much nothing preventing implementation of another language from using the same ABI and being compatible. So what's the point in calling them "C++" at all? For example, COM binary interface (mentioned here) is a binary ABI. And it's clearly based around the concept of C++ vtables (as commonly done to support virtual calls). And yet, a host of languages work with COM, including C, who, in reality, was a base for the whole thing.

Further, ABIs you mention are made for the purpose of interoperability between C++ compilers (and to be frank, there's good reasons to have that on a given platform). But this discussion started around the operating system interface (to the so-called "userland"). Not quite the same thing.

It seems to me that the whole point of ABI stability is to ensure that binaries are compatible across compilers and in particular compiler versions.

There is no such thing as "C++ ABI". Heck, there's even no such thing as a "C ABI". C (and by extension, C++) language makes no provision whatsoever for making an ABI (and rightly so, IMNSHO).

OK, so there's no general "C++ ABI" but doesn't Microsoft guarantee a "Windows x86 C++ ABI"? In other words, I can take a header file and a binary produced with Visual C++ 5 in 1998 and link to it now from Visual C++ 2012. That seems like an ABI stability guarantee that is really valuable, and the type of guarantee that @jangray is talking about in his comment on the article.

The problem with changing the default alignment mechanism isn't that system calls will suddenly fail, it's that it will break everyone's implicit ABIs.

OK, so there's no general "C++ ABI" but doesn't Microsoft guarantee a "Windows x86 C++ ABI"? In other words, I can take a header file and a binary produced with Visual C++ 5 in 1998 and link to it now from Visual C++ 2012.

MS does not guarantee such a thing, not to my knowledge. Bar sheer luck, there's no reason that you will manage to make a binary with MSVC and link to it from e.g. GCC, ICC or something else. Same goes for different MSVC versions. You possibly might get away with extremely basic things, but if your binary exposes anything from the CRT (or CppRT) from VC98, you can kiss any compatibility goodbye.

Did you actually try that, or...? Compiler and library incompatibilities between even one compiler versions are abound. I ran into dozens of problems on the MS compiler exclusively. I really don't see where you are coming from.

Or, for a less fragile and hackish solution, you could probably just specify explicit packing for the class and then use static_assert to ensure that anything that actually requires alignment is aligned properly.

Interesting! Pragma pack(1) isn't ideal in the general case because it produces less optimal code (such as vector intrinsics). I heard one group used clang to identify packing issues. A lot of our classes were generated from a script compiler (unreal 3) so we just changed the script compiler to rearrange properties for better packing.

Before posting the above, I just made me an example, to see if there is something really wrong with the compiler, that would break alignment directives, and I used the big hammer (pack(1)), and I flew off that.

creates an infinite loop in GCC. While in VS2010 you have to iterate as:

i <= 6

to get the infinte loop. It's as if the array and the int are not packed as tightly as the could. I was thought they are supposed to be, unless the data structure is less than 8 bytes (depending on system) in which they don't have to, for speed optimization for the CPU to access the memory which it does in 8 byte chunks (depending on the system).

I remembered that when I read the OP's post. Can someone explain to me what's going on here? What is stored between the int-array and the int?

Variables with automatic storage class (what we think of as "stack variables") are usually stored on the stack, but there's no requirement for this. VS2010 is most likely putting i in a register. If these variables are stored on the stack, there's still no requirement for them to be stored in any particular order, or packed together in any particular way -- only that their individual alignment conditions are obeyed. (E.g. a compiler is welcome to offer a debug capability in which "test regions" filled with some unusual value are added around stack variables to help in detecting past-the-end writes.)

It's also possible that VS2010 may be partially or completely unrolling this loop (more likely if you're using /O2).

actually we don't know the exact use case of the user. A vector has over good properties over an array, although not so much for int but being able to "populate by push_back" is benificail, it is also is a lot more efficient to move. It is these reason std::vector is recommended (by Hery Sutter and co) to be the "default" sequence container choice.

You can shoot your self in the foot by using something like a std::array then realising you need to move it, or grow or whatever else.

I think the stack layout is what's causing the behaviour because depending on where i is layed out in the stack compared to the array the code either causes an infinite loop or doesn't cause an infinite loop. So the stack layout is most certainly the cause. I understand that it's undefined behaviour, and that's kind of the reason I wanted to understand why it compiled differently depending on compiler. From the replies it seems that the C++ standard simply does not specify stack layout.

The whole meaning of the words "undefined behavior" is that it's not defined (i.e. unknown) what happens when you hit it. You could have as easily had one compiler make an infinite loop and the other blow up your PC (or at least deliberately crash the program). There's no reason at all to expect same behavior from different compilers in this situation.