Calling conventions

I'm playing around with a Z80 and the compiler I'm using pushes parameters into the stack from left to right. This made me curious since calling conventions usually push from right to left, or so is my understanding. However this doesn't seem to alter the order of evaluation and brings forth two questions...

1. Is order of evaluation independent of the calling convention?

2. A discussion on this issue revealed SPARC compilers seem to prefer fastcall. They load registers with the first 6 parameters and any remaining are pushed down into the stack. What's the order they are pushed in?

Originally Posted by brewbuck:Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

The order of evaluation of arguments is not clearly defined by the C standard in the first place, so

Code:

foo(bar(), baz());

does not define whether bar or baz is called first.

Parameter passing order is (possible to implement in an order that is) independent of the parameter passing order. For example, WIndows uses Left-to-right as well as Right-to-left pushing of arguments - cdecl uses right-to-left, whilst stdcall uses left-to-right.

Of course, right-to-left works much better for variable argument functions, like printf() where you want to be able to locate the argument that is the holder of the "number of arguments" information (e.g. the formt string for printf/scanf).

Regarding fast-call (or whatever you call it), it is obviously the fastest method when you have sufficient registers to pass the arguments and still be able to do at least some small amount of argument calculation. x86-32 is pitifully low on general purpose registers, which makes it very hard to benefit MUCH from a register passing calling convention. x86-64, which has 16 registers defaults to passing arguments in registers.

--
Mats

Last edited by matsp; 01-15-2009 at 09:15 AM.

Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.

Hmm... was under the impression both use right-to-left and is how the stack is cleared that differentiates them. cdecl has the calling code clean the stack and stdcall has the function itself cleaning the stack.

Of course, right-to-left works much better for variable argument functions

Yes. But the reason stdcall can't use variadic functions is because of how the stack is cleared.

x86-64, which has 16 registers defaults to passing arguments in registers.

Any idea what happens with any remaining arguments?

In any case, its still not clear to me why stack pushing order doesn't seem to affect evaluation order...

Last edited by Mario F.; 01-15-2009 at 10:03 AM.

Originally Posted by brewbuck:Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Hmm... was under the impression both use right-to-left and its how the stack is cleared that differentiates them. cdecl has the calling code clean the stack and stdcall has the function itself cleaning the stack.

I'm FAIRLY sure that they switch order - at least they used to when it was called "PASCAL" rather than "STDCALL". It may have changed.

Yes. But the reason stdcall can't use variadic functions is because of how the stack is cleared.

Yes, indeed - but it makes life for the called function MUCH easier if you know where the "this is how many you have" argument is first.

Any idea what happens with any remaining arguments?

In any case, its still not clear to me why stack pushing order doesn't seem to affect evaluation order...

Any register calling method will pass REMAINING values on the stack (and also struct variables larger than one or two registers worth).

--
Mats

Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.

We aren't on the same pitch
I mean in what order does fastcall pushes into the stack any remaining arguments that can't fit into the available registers.

Whatever order the ABI describes - there is actually no defined order in the standard for which order arguments are stored on the stack, or how many registers vs. stack arguments you get for a given number & type of arguments. But you can probably figure out what it does in gcc x86-64 for linux at least. Not sure if there is a 64-bit gcc for Windows.

--
Mats

Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.

I have a question that is related to this one actually . As far as I know, the order of evaluating other operators - such as "&&" - isn't defined either. Or are they?
I mean, I have seen loads and loads of projects that use code like this (and I use it as well):

Code:

if(ptr && *ptr == ' ')

Which would in essence be wrong if it is undefined. So, when *is* it defined?

I have a question that is related to this one actually . As far as I know, the order of evaluating other operators - such as "&&" - isn't defined either. Or are they?
I mean, I have seen loads and loads of projects that use code like this (and I use it as well):

Code:

if(ptr && *ptr == ' ')

Which would in essence be wrong if it is undefined. So, when *is* it defined?

That is definitely very defined (in the C and C++ languages - other languages may or may not do this). C is guaranteed to "short cut" any combinatorial expression when it knows the output, so the above is fine - it will never try to access a NULL pointer.

--
Mats

Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.

As far as I know, the order of evaluating other operators - such as "&&" - isn't defined either. Or are they?

&& and || introduce sequence points, so the order of evaluation would be defined with respect to them. (After all, this is needed for lazy/short-circuit evaluation to work, as in your example)

EDIT:
Ah yes, we are in the Tech Board, so the caveat is that it depends on the programming language.

Originally Posted by Bjarne Stroustrup (2000-10-14)

I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.

Win32:
cdecl is right-to-left, caller-clears, all on stack
stdcall is left-to-right, callee-clears, all on stack
fastcall is left-to-right, callee-clears, 4 in registers I believe

Win64:
everything is right-to-left, caller-clears, 6 in registers (GP for integral and pointers, XMM and GP for floating), remaining on stack
When there are more than 6 arguments, space for the 6 is reserved on the stack but not filled. If the function is a vararg function, it first dumps all argument registers to this reserved space. (That's also the reason why floating point arguments are in both the XMM and GP registers.)

All the buzzt! CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law

Whatever order the ABI describes - there is actually no defined order in the standard for which order arguments are stored on the stack, or how many registers vs. stack arguments you get for a given number & type of arguments.

Look, I'm not explaining myself well I guess. My bad.

I know the standard doesn't define it. Heck, the standard doesn't even care if I have a stack or registers. However, while not standard, they still are *conventions*. And as such my two initial questions still remain...

I was actually able to get answer to the second. I found the SPARC ABI and if I'm reading it right, the first 6 arguments are passed to the In registers left to right and any remaining arguments pushed to the stack from right-to-left. However, it's not so clear to me what's the x86 convention is for fastcall since I have read it already being described both ways. In one place it was telling me they get pushed LtR, in another place it was saying RtL. But the x86 wasn't my question.

So we are left with the first only. It's not clear to me why stack push order doesn't have an effect on argument evaluation. I have limited knowledge on these things and so I would like to know more about how arguments are evaluated.

Originally Posted by brewbuck:Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Stack push order is just that, stack push order. While it may be desireable for the compiler to evaluate the arguments in push order (evaluate, push, evaluate, push, ...), that's not always the case. Especially since a compiler can easily allocate space for all arguments at once, and then just write to the correct location, so it is really free to do the evaluation in whatever order it wants.

When there's a register-passing convention, the difference will be even greater. Reason being, every argument already completed increases the register pressure, unless it gets spilled, in which case the point of passing it in a register is lost. So the compiler will order operations by the number of registers they need. Simple variable loads will be last.

All the buzzt! CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law

Stack push order is just that, stack push order. While it may be desireable for the compiler to evaluate the arguments in push order (evaluate, push, evaluate, push, ...), that's not always the case. Especially since a compiler can easily allocate space for all arguments at once, and then just write to the correct location, so it is really free to do the evaluation in whatever order it wants.

When there's a register-passing convention, the difference will be even greater. Reason being, every argument already completed increases the register pressure, unless it gets spilled, in which case the point of passing it in a register is lost. So the compiler will order operations by the number of registers they need. Simple variable loads will be last.

Excellent! Thank you.

Originally Posted by brewbuck:Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.