I'm guessing I'm just going to have to declare all these things to take const-references instead of __m128 by value (a bit of Googling has suggested that __m128 by value isn't considered quite safe in general, although that seems a bit silly to me). Hopefully the references will get optimized away and not *actually* forced into memory all the time...

Kenneth_Gorking
—
2011-02-14T06:23:44Z —
#2

Well, like MSDN says: "The align __declspec modifier is not permitted on function parameters", which kind of makes sense, when you think about it.

oisyn
—
2011-02-14T07:11:57Z —
#3

No, it does not make sense at all, when you think about it. Also, it makes even less sense that it's allowed for the first three parameters (well, for sse datatypes at least), but not for the rest.

@Reedbeta: in my experience, the compiler will not force them to memory. If the function can be inlined and the contents are already in a register at the call site, it will keep them in that register.

Kenneth_Gorking
—
2011-02-14T19:40:27Z —
#4

It makes perfect sense. If you were allowed to align formal parameters, calling conventions would go out the window.

The reason it might be able to handle 3, and not 4, might be because it is able to keep the first 3 __m128s in registers, but when introducing more, it runs out of space, which forces it to use the stack, and hence the alignment error.

You can probably circumvent this, by passing them as references, instead of by value.

It makes perfect sense. If you were allowed to align formal parameters, calling conventions would go out the window.

Your argument is invalid. Fact is that current calling conventions don't deal with explicitely aligned parameters. It would be perfectly compatible by adding rules for aligned types. Since you weren't allowed to declare such functions before, no code is going to break by allowing them now.

The reason it might be able to handle 3, and not 4, might be because it is able to keep the first 3 __m128s in registers

Ok, wait, first you argue about calling conventions, and then you talk about having them passed in registers? How is that compatible with cdecl or stdcall?

Kenneth_Gorking
—
2011-02-15T09:28:39Z —
#6

@.oisyn

Fact is that current calling conventions don't deal with explicitely aligned parameters

That's what I was getting at. Any padding added by alignment, would result in the function accessing bad data. @.oisyn

It would be perfectly compatible by adding rules for aligned types. Since you weren't allowed to declare such functions before, no code is going to break by allowing them now.

Trying to patch up x86 at this time, seems futile. It would probably also be a nightmare inducing endeavor. Simply switching to x64, would make all this go away. @.oisyn

Ok, wait, first you argue about calling conventions, and then you talk about having them passed in registers? How is that compatible with cdecl or stdcall?

First, I was speaking generally, then I was addressing the problem at hand. Also, __m128 variables are mapped directly to XMM registers, so your point is moot.

Anyways, after some digging around, I found that the first three __m128 are indeed passed in registers, and the rest go on the stack, hence the compiler error.

That's what I was getting at. Any padding added by alignment, would result in the function accessing bad data. Trying to patch up x86 at this time, seems futile. It would probably also be a nightmare inducing endeavor.

You don't have to patch anything. As said, you currently can't use aligned parameters altogether. This means that current function declarations don't contain any aligned types, which implies that you will *NEVER* access bad data by introducing padding. Simply because padding will not be required for those functions. We're only talking about functions with __declspec(align(x)) type parameters, and they don't exist yet.

First, I was speaking generally, then I was addressing the problem at hand. Also, __m128 variables are mapped directly to XMM registers, so your point is moot.

An __m128 variable is just as mapped to a register as an int is.

Nick
—
2011-02-15T13:44:16Z —
#8

@.oisyn

An __m128 variable is just as mapped to a register as an int is.

I could be very mistaken, but I was under the impression that the Visual C++ compiler treats them as special. It's a proprietary data format and it's not well defined. If something works, great, if it doesn't, oh well.

oisyn
—
2011-02-15T13:55:09Z —
#9

It's only special in the sense that it's essentially a user defined type, yet the compiler understands that it can put them in SSE registers as if it were a built-in type. Aside from the fact that it's a struct, it isn't much different from an int.

Nick
—
2011-02-15T14:20:27Z —
#10

@.oisyn

Aside from the fact that it's a struct, it isn't much different from an int.

Even as a struct it's very special. I believe the debugger is capable of showing the symbolic values even when it's really stored in a register.

This must fix your problems. Function Params can't be aligned by compiler and there are reasons for that. You must understand how it will be translated in machine code. First registers are limited and there are two way to pass parameters to a funtion: __stdcall (Standard calling convenction) by stack; __fastcall (Fast calling convenction) first n values by registers the rest onto the stack. Now Microsoft, GCC are using by default __fastcall vs Embardacero C++ Builder using __stdcall for c++ code. If you compile for 32bits platform only the first 3 are passed by registers, if you compile for 64 bits only the first 6 on windows 8 on linux. All the parms that are passed by stack will be not aligned. Can you guess why ? stack pass: push value1, push value2, call function

function: push ebp mov ebp, esp sub esp, 8

...

mov esp, ebp pop ebp ret 0

So a function is compiled in save the stack pointer, allocating local variable into the stack, if you request aligned on local variable it will padded and the stack become a mess but it will be restored at function exit. If you want to pass aligned data into the stack the compiler must add padding before calling the function and restore at the exit by correcting the stack pointer and it means that it must create n variation of the compiled function depend on the stack padding combination, for aligned 16, 16 combinations. The compiler won't do this for you. Pass by reference it's the only solution that not rely on specific compiler or platform behaviour. Because it will pass on the stack the value of the address and than fetch the data on the function that was previously aligned globally or into the stack of the previously function called.

if you compile for 64 bits only the first 6 on windows 8 on linux. All the parms that are passed by stack will be not aligned.

The first 4 parameters are passed on the stack, for both systems, and everything on the stack is aligned in a 16-byte boundry. That is also why this problem won't be an issue on x64 systems.

Sandevil
—
2011-02-15T20:30:44Z —
#14

__cdecl come from ex borland __fastcall in witch the first 3 parameters are passed by registers. by the way nowadays they produce the same code. Windows and linux do not follow the same calling convenction on 64 bits system, but i must correct they are 4 per windows and 6 for linux.

Exactly, the __m128 type has a set of rules in calling conventions just like any other built-in type.@Sandevil

__cdecl come from ex borland __fastcall in witch the first 3 parameters are passed by registers. by the way nowadays they produce the same code.

Actually, __cdecl is the x86 way of passing arguments by pushing them on the stack from right to left. Values are returned in their respective register(s), and the caller is responsible for stack cleanup.

Indeed, look there. You will read that all the cdecl parameters are passed on the stack. Perhaps you are confused by the fact that eax, ecx and edx are free for the function to use (ie., their state need not be restored). Here's another source: http://en.wikipedia.org/wiki/X86\_calling\_conventions#cdecl

The cdecl calling convention is used by many C systems for the x86 architecture[1]. In cdecl, function parameters are pushed on the stack in a right-to-left order. Function return values are returned in the EAX register (except for floating point values, which are returned in the x87 register ST0). Registers EAX, ECX, and EDX are available for use in the function.

Sandevil
—
2011-02-15T22:50:08Z —
#17

No, i'm not confusing start by the fact that the first 3 do not give errors because they are passed by registers. __cdecl and __fastcall are nowadays the exact things, in fact __fastcall was introduced by borland that pass the first 3 in registers and all rest by the stack from left to right, microsoft in it's war against Borland compilers have used a calling conventions 2 by registers and all the rest onto the stack but right to left (the last parameter is pushed first). Borlad was faster and microsoft use 3 register by regs and the rest from right to left onto the stack. Microsoft win gcc adopt the same and now is referred as __cdecl but borland continue to call __fastcall and today microsoft compilers treat __fastcall the same as __cdecl. So pratically they are now different names to identify the same sheet. I now because if you today use __fastcall in c++ builder and than use a dll compiled with Visual c++ with __cdecl everything is ok. If they were different a crash will occour. You probably don't remember the war between Microsoft, borland and Sybase. I'm not here to start a war but trust me i know what i'm saying __fastcall and __cdecl are today the same thing like __property (c++ Builder) and __declspec(property) (Microsoft). In the link you post see Microsoft fastcall and Borland fastcall and you start to understand the compiler war of the past. As my history teacher used to say "the winner of a war write the history".

Sandevil
—
2011-02-16T00:40:00Z —
#18

By the way this confusion come by the fact that different compilers use the same thing for different meaning or different identifiers for the same meaning and most of them are not standard. The standard for c++ code do not exist, but the de facto standard is to pass the first 'n' params into same reigsters that obviously vary from cpu to cpu (x86 is not equal as power7 or powerpc or an arm cpu). The standard calling convenction for C code (C89, C99) force the compilers to pass all the parameters into the stack. So against the standard if you want to use c code in c++ you must declare it in this way: extern "C" { result functionName(params); }; Microsoft compiler for a non class member function you can declare a c function as WINAPI or __cdecl (C declaration) or __stdcall (C standard call) avoiding the extern "C" { ... }. WINAPI is __cdecl because OS is written in C and not in C++ and all the calls to a Windows API must pass all the parameters in to the stack in respect to the standard. Other compilers use the standard so for mingw compiler WINAPI is tipically declared as and empty macro and all the OS API are included in a header with extern "C" { } for a c++ compiler or the compiler will receive the switch to compile in C mode. For the OS api just include the header the compiler will do the rest. Usually i try do not use non standard specifier if possible, so i generally use only 'inline'. For all the rest i write a macro like: #define FORCE_INLINE __forceinline #define DLL_IMPORT __declspec(dllimport) and so on. In this way port to another platform will be easy.

What do you think about intel that is forcing microsoft to adopt a c++ calling convenction that will benefits only their processor and also are escluding us to use inline assembly on 64 bits windows platform ? Nowing the fact that will penalize AMD and that linux has chosen the AMD proposed calling convention. So we will have AMD with a chance to win for free on linux web server and intel win (by paying microsot) on desktop/notebook platform.

Microsoft fastcall Microsoft or GCC __fastcall convention (aka __msfastcall) passes the first two arguments (evaluated left to right) that fit into ECX and EDX. Remaining arguments are pushed onto the stack from right to left.

Borland fastcall Evaluating arguments from left to right, it passes three arguments via EAX, EDX, ECX. Remaining arguments are pushed onto the stack, also left to right.

It is the default calling convention of Borland Delphi, where it is known as register.

I now because if you today use __fastcall in c++ builder and than use a dll compiled with Visual c++ with __cdecl everything is ok. If they were different a crash will occour.

You are wrong. This will never just work. The arguments are passed on the stack in the wrong order, and a few of them are passed in registers. And no, a crash would not by definition occur - arguments would simply have wrong values. If course, if one of them happes to be a pointer which you're dereferencing, then it might crash. I urge you to try it, and post your original code and your results here. Perhaps you're remembering it wrong? Or there was a "#define __fastcall __cdecl" somewhere in the code or project settings? Or your functions only used zero parameters, or just one float or something.