I think OCaml should try to make sure stack is aligned when calling C code.

I can imagine of at least two ways to do that :

(1) As in macosx, ocamlopt keep the stack aligned, so that C functions are calld with an aligned stack
(2) We add the force_align_arg_pointer attribute to all C primitives (by using the CAMLprim macro), so that the C compiler knows that it has to generate code to realign the stack. There is few other C functions that also need to have this attribute.

(1) uses more stack space, but I do not know whether the attribute proposed in (2) is available on all the C compilers we need.

However, all this is my own opinion, and it does not count much, as I am not an ocamlopt expert.

Linux/x86-32 bits is supposed to comply to the SVR4/ELF x86-32 ABI, which states that the stack is always 4-aligned, but does not require 16-byte alignment.

AFAIK, the only x86-32 ABI that requires 16-alignment for the stack is MacOS X.

ocamlopt generates code that conforms to the corresponding ABIs. All the support for 16-alignment of the stack is there, but it is activated only for MacOS X.

Finally, keep in mind that bigger stack alignment has a cost, in terms of wasted stack space. For deeply recursive functions, this wasted space can cause a stack overflow. So, I'm not in favor of increasing alignment if there is no good reason.

Bottom line: I don't think a x86-32 compiler for Linux is ever allowed to assume 16-alignment on the stack pointer. If "clang -O2" does assume this, I'd say it is a clang/llvm problem.

> Linux/x86-32 bits is supposed to comply to the SVR4/ELF x86-32 ABI, which states that the stack is always 4-aligned, but does not require 16-byte alignment.

Indeed. However, it seems like there is a consensus among C compiler writers on the fact that the stack should be 16-aligned. This has been a debate on the GCC bugtracker (see the link above), and the statu quo is that both GCC and LLVM rely on it. The glibc has been updated for this purpose.

> Finally, keep in mind that bigger stack alignment has a cost, in terms of wasted stack space. For deeply recursive functions, this wasted space can cause a stack overflow. So, I'm not in favor of increasing alignment if there is no good reason.

I agree. Moreover, 16-aligning the OCaml stack might not be worth it, because Ocaml mainly does 4-bytes memory accesses. However, we do not need to 16-align the OCaml stack: we could add the force_align_arg_pointer attribute in the CAMLprim macro. It forces the C compiler not to assume that the stack is 16-aligned. It is available in not-too-old versions of LLVM and GCC.

> we could add the force_align_arg_pointer attribute in the CAMLprim macro

This would force all external stub libraries to use CAMLprim. It seems that the manual indeed says this is required, but since CAMLprim does nothing, I'm not sure all stub libraries actually follow this convention (admittedly, they should be fixed).

> For deeply recursive functions, this wasted space can cause a stack overflow.

It is conceivable (and cheap) to fix the stack alignment only around calls to C functions? I don't think the case of deeply recursive functions going through C matters in practice.

> This would force all external stub libraries to use CAMLprim. It seems that the manual indeed says this is required, but since CAMLprim does nothing, I'm not sure all stub libraries actually follow this convention (admittedly, they should be fixed).

If such a stub does not contain CAMLprim, it is bogus (what's the point of CAMLprim if it is not used ?). Note that anyway, those stubs will be as likely to crash as now, because nothing is done currently.

> It is conceivable (and cheap) to fix the stack alignment only around calls to C functions? I don't think the case of deeply recursive functions going through C matters in practice.

While possible, I am not sure it is that easy: you will have to realign the stack before pushing arguments, and restore the stack pointer after the call. However, the compiled code for evaluating the arguments can depend on the current offset between the current frame and the stack pointer, which is unknown statically...

> If such a stub does not contain CAMLprim, it is bogus (what's the point of CAMLprim if it is not used ?). Note that anyway, those stubs will be as likely to crash as now, because nothing is done currently.

This is true, but since Linux x86-32 is an exotic architecture these days, it is unlikely that authors of stubs will notice the problem. If this solution is picked (instead of having ocamlopt enforce the alignment), it will be important to make it widely known that CAMLprim is now required.

> It is conceivable (and cheap) to fix the stack alignment only around calls to C functions?

No, as far as I can see. This would require extensive changes in the i386 code generator, plus an extra register must be reserved (in the code that pushes the arguments to the call) to act as an alternate stack pointer.

The possible workarounds seem to be:
1- Compile the stub code for the C libraries in question with GCC flag -mstackrealign
2- Put attribute force_align_arg_pointer in CAMLprim as suggested by Jacques-Henri and encourage authors of stub code to use CAMLprim
3- Impose 16-alignment of the stack throughout all OCaml-generated code, like we currently do for i386-macosx.

#2 looks like the least intrusive solution, but I wonder how much of a speed penalty this would be for the OCaml runtime itself.

Looks like the best solution is Xavier's option (2): add force_align_arg_pointer to CAMLprim, with all the necessary incantations in the configure script to make sure it's only used with C compilers that know it.
Who wants to provide a patch?

Out of curiosity, there's still one unanswered question: why does this affect clang and not gcc if both are supposed to assume 16-byte alignment of the stack?

I used a patched version of GCC that produces executables that crashes when entering a function with an unaligned stack. It appeared that some C callbacks were not annotated with CAMLprim (even in the Ocaml runtime). I have corrected many of them, but it may be the case there are still some of them.

I do not have a reliable way to test in the configure script whether the force_align_arg_pointer attribute is actually taken into account for the system compiler: I just tests it does not produce an error. With gcc, unknown attribute just produce warnings.

I think it could be a good idea to use this method for the macosx architecture: indeed, it would remove the performance penalty of keeping the Ocaml stack aligned. What do you think ?

Fixed in version/4.02 (commit 14943) and on trunk (commit 14944) by forcing 16-alignment of stack for all x86-32 platforms except Win32/MSVC. It's not like the GCC and Clang developers left us much choice. Enjoy the new stack consumption. For the record, I still morally object to compiler writers violating ABIs, instead of evolving ABI standard documents.