Table of Contents

2012.02.26 - overhead, footprint and stuff

we often tend to put too little focus on keeping their knowledge up to date. more over, programmers tend to do optimizations based solely on “strong feeling” rather than experiments and measurements. plus we all know that premature optimization is the root of all evil, right? ;)

there is an excellent publication the topic of C++ performance, namely - Technical report on C++ performance. this is 2006 publication, written by many, including Bjarne Stroustrup and Dave Abrahams. i strongly encourage you to read it. this whole entry is based on this text and simple experiments i've performed to see the theory in action.

so let's move to the actual cases.

exceptions

during one of recent meetings an issue regarding exceptions was risen. some of the legacy code appeared not to work correctly, when exception was thrown. these things happen - we all love legacy code after all, right? anyway the topic quickly drifted away from the particular code-related problem to more general – exceptions overhead in C++. i argued that AFAIK there is no extra overhead when using the exceptions, compared to alternative usage of error codes and pointers to error codes, passed as function arguments. other ppl were arguing that using pointers introduces extra overhead in runtime, making stack grow bigger, some extra memory locations were busy with exceptions-related structures, etc… as i haven't heard about this i decided to read some more on this topic.

this is the place i discovered Technical report on C++ performance. from this i learned that classically there are 2 basic approaches for implementation of exceptions handling code.

“table” – published in 1994 and commonly used since then (with some improvements).

so what have changed in exception handing over last 20 years? i strongly encourage to read the original article, but just for the sake of this entry i'll explain it shortly. “dynamic” approach is based on putting pieces of information, about what d-tors are to be called upon stack unwinding, on the stack, so that in case exception is thrown, proper objects can be destroyed, as required by the standard. it does introduce runtime overhead both in memory usage (stack growing faster) and execution time (extra code to push/pop elements every time some point of exception is reached).

because of it, it was soon replaced with “table” approach, where compiler generates tables with code entries that are responsible for stack unwinding in case exception occurs. what does it mean? statically generated code for handling exceptions, that is never executed, unless an error is thrown. no extra stack overhead. no extra execution time, while in non-exceptional flow.

experimental results

this is a good place to start experiments. i've wrote simple Fibonacci sequence computation, using recursion. program checks stack size and execution times with and without exceptions.

conclusions

and so, ironically, using exit codes makes footprint larger than exceptions (!) and growing with the stack depth. conclusion: let the compiler do the micro optimizations while you focus on the architecture and the design. as Richard Hamming said: “Machines should work. People should think.”.

templates vs. switches: the enums case

in more less the same time as the “exceptions discussion” took place i found a smelly piece of code, that due to lack of elastic architecture, translated one enumeration type to another. there were few problems with that. first of all it is done with switch(){} statement i do not personally like. second thing is it is used in a few places. third thing is that this enumeration translation must be done in a both directions, thus using switch(){} you need to maintain two pieces of code, that are highly coupled.

what i decided to do is to write simple, template-based wrapper, that given single structure, explaining which value from enum1 is which in enum2, can be used to do bidirectional mapping. this means keeping only one table of translation, that works both ways. the main charges i faced, for this solution were, that it is complicated and less effective. frankly i think this piece of code is fairly simple, but even if i'm wrong, it still does not matter much, since as opposite to the 2-functions-with-switch()es solution this code has to be written just once. next time bidirectional mapping is needed, introducing new transition table to a generic template is just enough – it is that simple! when it comes to efficiency GCC 4.6 on amd64 machine produced binaries of the same sizes in both cases.

experimental results

here are my results, tested on two machines, with different systems and architecture. in case of doubt compile the enum test examples and check for your self.

executable size on amd64 with gcc-4.6, optimized for size:

mpl

manual

6984 bytes

6984 bytes

executable size on amd64 with gcc-4.6, optimized for speed:

mpl

manual

6984 bytes

6984 bytes

executable size on ia32 with gcc-4.4, optimized for size:

mpl

manual

4312 bytes

4312 bytes

executable size on ia32 with gcc-4.4, optimized for speed:

mpl

manual

4976 bytes

5144 bytes

conclusions

as one can see, gcc-46 on amd64 gave the same results in both cases, as the hand-written code. optimizations work fine, so why to bother repeatedly writing code manually? on the other hand, the older compiler (gcc-4.4) produce far bigger code, when optimized for speed. this usually is not a good sign, since due to processor caches, the same machine code often runs faster, when it is compact. notice however that this does not apply to an extra code, added as an exception handling, since it is never executed, under regular conditions, and (depending on the compiler's code layout), probably won't be even prefetched.