Optimization in GCC

Here's what the O options mean in GCC, why some optimizations aren't optimal after all and how you can make specialized optimization choices for your application.

Testing for Improvements

Earlier we used the time command to identify how much time was spent
in a given command. This can be useful, but when we're profiling our
application, we need more insight into the image. The gprof utility
provided by GNU and the GCC compiler meets this need. Full coverage of
gprof is outside the scope of this article, but Listing 3
illustrates its use.

The image is compiled with the -pg option to include profiling
instructions in the image. Upon execution of the image, a gmon.out
file results that can be used with the gprof utility to produce
human-readable profiling data. In this use of gprof, we
specify the -b and --no-graph options. For brief output (excludes the
verbose field explanations), we specify -b. The --no-graph option
disables the emission of the function call-graph; it identifies which
functions call which others and the time spent on each.

Reading the example from Listing 3, we can see that bubbleSort was called
once and took 790ms. The init_list function also was called, but it took
less than 10ms to complete (the resolution of the profile sampling),
so its value was zero.

If we're more interested in changes in the size of the object than speed, we can
use the size command. For more specific information, we can use the
objdump utility. To see a list of the functions in our object, we can
search for the .text sections, as in:

objdump -x sort | grep .text

From this short list, we can identify the particular function we're
interested in understanding better.

Examining Optimizations

The GCC optimizer is essentially a black box. Options and optimization
flags are specified, and the resulting code may or may not improve.
When they do improve, what exactly happened within the resulting code?
This question can be answered by looking at the resulting code.

To emit target instructions from the compiler, the -S option can be
specified, such as:

gcc -c -S test.c

which tells gcc to compile the source only (-c) but also to emit assembly
code for the source (-S). The resulting assembly output will be contained
in the file test.s.

The disadvantage of the previous approach is you see only assembly
code, no aspect of the size of the actual instructions is given. For this, we
can use objdump to emit both assembly and native instructions, like so:

gcc -c -g test.c
objdump -d test.o

For gcc, we specify compile with only -c, but we also want to include debug
information in the object (-g). Using objdump, we specify the -d option
to disassemble the instructions in the object.
Finally, we can get assembly-interspersed source listings with:

gcc -c -g -Wa,-ahl,-L test.c

This command uses the GNU assembler to emit the listing. The -Wa option is
used to pass the -ahl and -L options to the assembler to emit a listing
to standard-out that contains the high-level source and assembly. The
-L option retains the local symbols in the symbol table.

Conclusion

All applications are different, so there's no magic configuration of
optimization and option switches that yield the best result. The
simplest way to achieve good performance is to rely on the -O2
optimization level; if you're not interested in portability, specify
the target architecture using -march=. For space-constrained
applications, the -Os optimization level should be considered first.
If you're interested in squeezing the most performance out of your
application,
your best bet is to try out the different levels and then use the
various utilities to check the resulting code. Enabling and/or
disabling certain optimizations also may help exploit the optimizer
to receive the best performance.

M. Tim Jones (mtj@mtjones.com) is a senior principal engineer with Emulex Corp. in
Longmont, Colorado. In addition to being an embedded firmware engineer, Tim
recently finished writing the book BSD Sockets Programming from a
Multilanguage Perspective. He has written kernels for communications
and research satellites and now develops embedded firmware for
networking products.

apparentely, MSVC uses a few insecure optimizations counting that the developer created a secure code. Probably thats why its debug build is slower.

I've seen lots of situations where gcc code gives a error right away, and promptly showing me and bug and MSVC happily executing a code until it finally stumble upon a non-static field of a class and finally giving a error. For me , this is simple misleading and thats why I prefer gcc

Someone should write some "C" code and a few scripts that will enable / disable every compiler option and then print out which options worked best for _your_ particular system.

A benchmark that would specifically test each option (as opposed to using a single benchmark, and huge) could be written.

EG: no point in benchmarking if we should use:
gcc -O2 -O3 code.c -- One disables the other

gcc -fno-gcse SSE2_code.c

Benchmarks need to have a 'large' effect on the option that is being switched.

This could be ran overnight (or on multiple machines, each doing part of the testing) and results provided on a web page somewhere.

Experts could put in thier two cents and a wiki of snipperts could
be fed into a code compilator (not compiler, just a bunch of scripts) that would compilate all the snippets and produce a final program to be compiled on many different machines.

This way we could figure out that if we had such-and-such a system then "how-often" (what % of the time) would we simply be better off
to use a particular option and when is it more likely based on that TYPE of program we are running (wordprocessor vs. MultiMedia app).

EG: If you have a Pentium is is ALWAYS (or should be if gcc is correct) best to use the -march=pentium option - BUT - it is NOT always best to use "-fcrossjumping" (though it _could_ be for certain applications).

The output of all this could simply be a half dozen command line choices for each processor - including a "general purpose 'best'" setting and a "quick compile with great optimization" setting (for intermediate builds).

This is something that a few dozen people need to work on to get the ball rolling and then the rest of us need to pitch in and compile the resulting test scripts to check for errors. With everyone's help we should have the so-called answer(S) to "which compilation options should I use for machine-X when compiling applcation=category Y.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.