Options Related To Function Calling

gcc basically offers you several ways to manage how a function is called. Let's take a look at inlining first. By inlining, you reduce the cost of a function call because the body of the function is directly substituted into the caller. Please note that this is not done by default, only when you use -O3 or at least -finline-functions.

How does the finished binary look when gcc does inlining? Observe Listing 2:

Both test() and test2() are indeed inlined, but you also see test(), which stays outside main(). This is where the static keyword plays a role. By saying a function is static, you tell gcc that this function won't be called by any outside object file, so there is no need to emit the codes on its own. Thus, it is a space saver if you can mark them as static whenever possible. On the other hand, be wise when deciding which function should be inlined. Increasing size for a small speedup isn't always worthwhile.

With certain heuristics, gcc decides whether a function should be inlined or not. One of the considerations is the function size in term of pseudo-instructions. By default, the limit is 600. You can change this limit via -finline-limit. Experiment to find better inline limits for your own case. It is also possible to override the heuristics so gcc always inlines the function. Simply declare your function like this:

Now, on to parameter passing. In x86 architectures, parameters are pushed to the stack and later popped inside the function for further processing. But gcc gives you a chance to change this behavior and instead use registers. Functions with up to three parameters could use this feature by passing -mregparm=<n>, where <n> is the number of registers we want to use. If we apply this parameter (n=3) to Listing 2, take out the inline attribute, and use no optimization, we get this:

Instead of stack, it uses EAX, EDX, and ECX to hold the first, second, and third parameter. Because register access time is faster than RAM, it is one way to reduce runtime. However, you must pay attention to these issues:

You MUST compile all your code with the same -mregparm register number. Otherwise, you will have trouble calling functions on another object file since they assume different calling conventions.

By using -mregparm, you basically break the Intel x86-compatible Application Binary Interface (ABI). Therefore, you should mention it when you distribute your software in binary only form.

You probably notice this kind of sequence at the beginning of every function:

push %ebp
mov %esp,%ebp
sub $0x28,%esp

This sequence, also known as the function prologue, is written to set up the frame pointer (EBP). It is useful to help the debugger do a stack trace. The structure below helps you visualize this [6]:

[ebp-01]

Last byte of the last local variable

[ebp+00]

Old ebp value

[ebp+04]

Return address

[ebp+08]

First argument

Can we omit it? Yes, with -fomit-frame-pointer, the prologue will be shortened so the function just begins with a stack reservation (if there are local variables):

sub $0x28,%esp

If the function gets called very frequently, cutting out the prologue saves your program several CPU cycles. But be careful: by doing this, you also make it hard for the debugger to investigate the stack. For example, let's add test(7,7,7) at the end of test2() and recompile with -fomit-frame-pointer and no optimization. Now fire up gdb to inspect the binary:

On the second call of test, the program is stopped and gdb prints the stack trace. Normally, main() should come up in Frame #2, but we only see question marks. Recall what I said about the stack layout: the absence of a frame pointer prevents gdb from finding the location of the saved return address in Frame #2.