Was it ever necesary to think about them? I certainly haven't.
–
anonDec 30 '09 at 20:05

5

"Well enough" for what? If you can't test whether the compiler has applied an optimisation you're interested in, then you don't need to know whether that optimisation has been applied -- at risk of existentialism, if you can't tell the difference then there is no difference ;-) If you can explain what situations you expect hoisting, then (a) someone might be able to answer the question, and/or (b) a simple test would demonstrate what versions of gcc, with what compiler options, perform that hoist.
–
Steve JessopDec 30 '09 at 20:08

Sorry, I am not up to date with advanced tech-speak like 'loop hoisting', that you are talking about. However, a quick google search revealed, that it means "removing loop-invariant logic from the body of the loop". Now, being a simple-minded person, I can't help but wonder, if you are doing unneccessary things in the body of the loop (thereby, slowing down the execution), isn't it just called "bad code"?
–
shylentDec 31 '09 at 11:18

2

@shylent: No, sometimes the invariant logic might be a simple arithmetic expression that's passed as a function parameter or similar. it's easier and cleaner to compute it on the spot at the call-site, rather than computing it, storing it into a temp variable, and then entering the loop where you reference that temp variable. If we can trust the compiler to automatically factor out this expression and place it outside the loop, then we've just managed to simplify our code and make it more readable
–
jalfDec 31 '09 at 11:35

I agree with jalf (on this, anyway). An example would be loops where you can either increment an integer or a pointer. You might have an instinct that the pointer will be faster. That instinct is not always correct, but the pointer is very unlikely to be slower. So you might think "always use pointers". But that's wrong, because the difference in performance is usually small, so it's not worth writing extra code. You should use whichever of an index or pointer is most natural for your function's inputs etc.
–
Steve JessopDec 31 '09 at 17:44

8 Answers
8

If your profiler tells you there is a problem with a loop, and only then, a thing to watch out for is a memory reference in the loop which you know is invariant across the loop but the compiler does not. Here's a contrived example, bubbling an element out to the end of an array:

for ( ; i < a->length - 1; i++)
swap_elements(a, i, i+1);

You may know that the call to swap_elements does not change the value of a->length, but if the definition of swap_elements is in another source file, it is quite likely that the compiler does not. Hence it can be worthwhile hoisting the computation of a->length out of the loop:

int n = a->length;
for ( ; i < n - 1; i++)
swap_elements(a, i, i+1);

On performance-critical inner loops, my students get measurable speedups with transformations like this one.

Note that there's no need to hoist the computation of n-1; any optimizing compiler is perfectly capable of discovering loop-invariant computations among local variables. It's memory references and function calls that may be more difficult. And the code with n-1 is more manifestly correct.

As others have noted, you have no business doing any of this until you've profiled and have discovered that the loop is a performance bottleneck that actually matters.

+1 for answering the question, actually teaching the OP something in the process, and not beating the asker over the head with regurgitated advice. I wish I could upvote this more.
–
tgamblinDec 31 '09 at 1:24

Write the code, profile it, and only think about optimising it when you have found something that is not fast enough, and you can't think of an alternative algorithm that will reduce/avoid the bottleneck in the first place.

With modern compilers, this advice is even more important - if you write simple clean code, the compiler's optimiser can often do a better job of optimising the code than it can if you try to give it snazzy "pre-optimised" code.

Also, don't forget to profile after optimization as well, to make sure that your optimizations are indeed optimizations.
–
JesperEDec 30 '09 at 20:26

4

I would like to think the "pre-optimization is the root", and "profile first" mantra has been beaten enough to death on this forum that it's a given, that perhaps the person has already considered this, that maybe the person simply wants his question answered about compiler features and quality rather than debated about it's necessity.
–
Will HartungDec 30 '09 at 22:38

4

-1 for another bad-faith answer to an optimization question. Once again the premature optimization parrots come out and get voted up because no one on StackOverflow bothers thinking about performance.
–
tgamblinDec 31 '09 at 1:04

5

@Steve Jessop: That doesn't negate the fact that this reflexive non-answer has been repeated on here a million times. I have no problem with telling people to profile because it's a great rule of thumb. But look at, say, Norman's answer below. It mentions the need to profile but actually teaches the OP something. It's also the only answer with a concrete explanation of when you should care and when you shouldn't. This answer is just regurgitation and vagaries, and it gets upvoted by all the people who prefer their best practices spoon-fed.
–
tgamblinDec 31 '09 at 1:31

Check the generated assembly and see for yourself. See if the computation for the loop-invariant code is being done inside the loop or outside the loop in the assembly code that your compiler generates. If it's failing to do the loop hoisting, do the hoisting yourself.

But as others have said, you should always profile first to find your bottlenecks. Once you've determined that this is in fact a bottleneck, only then should you check to see if the compiler's performing loop hoisting (aka loop-invariant code motion) in the hot spots. If it's not, help it out.

Compilers generally do an excellent job with this type of optimization, but they do miss some cases. Generally, my advice is: write your code to be as readable as possible (which may mean that you hoist loop invariants -- I prefer to read code written that way), and if the compiler misses optimizations, file bugs to help fix the compiler. Only put the optimization into your source if you have a hard performance requirement that can't wait on a compiler fix, or the compiler writers tell you that they're not going to be able to address the issue.

+1 for the focus on readability. I totally agree, and it seems very reasonable to assume that code that is easier for a human to understand has a better chance at being well optimized.
–
Peeter JootDec 31 '09 at 1:07

Early optimizations are bad only if other aspects - like readability, clarity of intent, or structure - are negatively affected.

If you have to declare it anyway, loop hoisting can even improve clarity, and it explicitely documents your assumption "this value doesn't change".

As a rule of thumb I wouldn't hoist the count/end iterator for a std::vector, because it's a common scenario easily optimized. I wouldn't hoist anything that I can trust my optimizer to hoist, and I wouldn't hoist anything known to be not critical - e.g. when running through a list of dozen windows to respond to a button click. Even if it takes 50ms, it will still appear "instanteneous" to the user. (But even that is a dangerous assumption: if a new feature requires looping 20 times over this same code, it suddenly is slow). You should still hoist operations such as opening a file handle to append, etc.

In many cases - very well in loop hoisting - it helps a lot to consider relative cost: what is the cost of the hoisted calculation compared to the cost of running through the body?

As for optimizations in general, there are quite some cases where the profiler doesn't help. Code may have very different behavior depending on the call path. Library writers often don't know their call path otr frequency. Isolating a piece of code to make things comparable can already alter the behavior significantly. The profiler may tell you "Loop X is slow", but it won't tell you "Loop X is slow because call Y is thrashing the cache for everyone else". A profiler couldn't tell you "this code is fast because of your snarky CPU, but it will be slow on Steve's computer".

Generally common-sense, and I (reluctantly) can see where some people might actually like the compiler to hoist code for them. Regarding "quite some cases where the profiler doesn't help", I attribute that to the abysmal state of profilers, based as they mostly still are on the concepts enthroned in gprof. I've been doing my level best to explain this, such as here: stackoverflow.com/questions/1777556/alternatives-to-gprof/…
–
Mike DunlaveyDec 31 '09 at 16:50

... One profiler that is breaking out of the gprof mold (partly) is this one (www.rotateright.com). The reason is that it samples call stacks, on wall-clock time, and it captures statement/instruction-level percents, and has a butterfly view optionally focussed on statements/instructions.
–
Mike DunlaveyDec 31 '09 at 17:35

Where they are likely to be important to performance, you still have to think about them.

Loop hoisting is most beneficial when the value being hoisted takes a lot of work to calculate. If it takes a lot of work to calculate, it's probably a call out of line. If it's a call out of line, the latest version of gcc is much less likely than you are to figure out that it will return the same value every time.

Sometimes people tell you to profile first. They don't really mean it, they just think that if you're smart enough to figure out when it's worth worrying about performance, then you're smart enough to ignore their rule of thumb. Obviously, the following code might as well be "prematurely optimized", whether you have profiled or not:

I don't think it's "special" at all. It is very simple: "The code becomes simpler, and uses fewer lines, if I just call the function where it is needed. And I trust the compiler to hoist out the call for me". I don't see the point in obfuscating your code if you don't know that it's necessary. Of course, it does depend on how and where countPrimesLessThan is defined. If the definition is not visible to the compiler then it can't assume anything about the function, and it won't be able to safely move it outside the loop. But if the definition is visible, trust the compiler.
–
jalfDec 31 '09 at 1:31

1

Well this is not 100% correct. My countPrimesLessThan() might use memoization, and thus only incur the call cost once. So its really only valid if you know what you're doing is expensive. And you might have to undo it later to do refactoring
–
Michael AndersonDec 31 '09 at 1:40

1

@OP: On the contrary - we believe that people should profile first, before microoptimizing. You can always go through a piece of code and microoptimize (and then test to see if it helps); what you won't know is where to do it before profiling. If the performance is inadequate, and you're working in a hot spot, knock yourself out.
–
David ThornleyDec 31 '09 at 1:43

2

Well, at risk of spoiling an excellent theoretical discussion with actual facts, g++ 4.3.2 does not hoist countPrimesLessThan with -O3 and everything defined in a single translation unit. So if you trust your compiler, your code is 10 times slower than mine. And the problem with this kind of "special" programming is that if you deliberately write everything to be absurdly slow, nothing will stand out as being particularly hot. This code will profile as spending all its time in countPrimesLessThan either way, whether it's called 1 time or 10.
–
Steve JessopDec 31 '09 at 1:44

@Michael: true, it might memoize. But unfortunately, the same crew who instructed me not to hoist the call out of the loop, also bullied me into never memoizing anything, because that's premature optimization too. So as it happens, it doesn't memoize, and I've edited to show my implementation. So unless I'm very lucky, and countPrimesLessThan was written by someone smart enough to ignore the wisdom of the crowds, my code is still 10 times slower than it should be.
–
Steve JessopDec 31 '09 at 2:05

A good rule of thumb is usually that the compiler performs the optimizations it is able to.
Does the optimization require any knowledge about your code that isn't immediately obvious to the compiler? Then it is hard for the compiler to apply the optimization automatically, and you may want to do it yourself

In most cases, lop hoisting is a fully automatic process requiring no high-level knowledge of the code -- just a lot of lifetime and dependency analysis, which is what the compiler excels at in the first place.

It is possible to write code where the compiler is unable to determine whether something can be hoisted out safely though -- and in those cases, you may want to do it yourself, as it is a very efficient optimization.

Is it safe to hoist out the call to countPrimesLessThan? That depends on how and where the function is defined. What if it has side effects? It may make an important difference whether it is called once or ten times, as well as when it is called. If we don't know how the function is defined, we can't move it outside the loop. And the same is true if the compiler is to perform the optimization.

Is the function definition visible to the compiler? And is the function short enough that we can trust the compiler to inline it, or at least analyze the function for side effects? If so, then yes, it will hoist it outside the loop.

If the definition is not visible, or if the function is very big and complicated, then the compiler will probably assume that the function call can not be moved safely, and then it won't automatically hoist it out.

I've added to my answer the implementation I used to test whether my compiler is capable of this simple hoist (it isn't). It's pretty obvious to me that my functions have no side-effects, but apparently not to my compiler. Perhaps it doesn't want to be accused of cheating on benchmarks, and thinks my loop is so stupid it can only be a benchmark ;-) Or perhaps it's not as aggressive as your description suggests.
–
Steve JessopDec 31 '09 at 2:02

@jalf, @Steve: I know this example is contrived in order to make a point, but I can't think of a realistic case where I want to write something inside a loop and expect the compiler to move it out. If I wrote it inside the loop by accident then I'm being stupid, and the idea that a compiler is going to be smarter and wipe up after me is disturbing. It's a mantra that compilers optimize better than people do, but it gets repeated without being justified.
–
Mike DunlaveyDec 31 '09 at 2:54

@Mike: I agree, compilers don't optimize better than people do in general. In certain cases they do, but certainly not every kind of optimization. But I'd argue that if you rely on a function call like this to be hoisted out of the loop, and the compiler doesn't do it, then it should be pretty easy to find out and then do it manually. It is such a simple and effective optimization either way that if the compiler doesn't do it for you, it's easy enough to manually remedy it. It's not one of those optimizations that has to be designed into the app from day 1.
–
jalfDec 31 '09 at 11:31

@jalf: I guess the reason I get so exercised over it is that it's symptomatic of a broader lack of understanding of performance in general. I've heard it from compiler writers for decades that such optimizations accumulate and generally make the code "faster" in a way that's hard to argue with. But my 30 years of performance tuning says it has 100% - epsilon to do with what the programmer does and almost nothing to do with the compiler, except when it comes to optimizing expressions and assignment statements, register allocation, etc.
–
Mike DunlaveyDec 31 '09 at 16:27

I think much the same as Mike about broader understanding - if you ignore performance while writing code, there's a steady drip-drip: "this decision saves me a second of thought or typing, and probably halves the speed". "Same again". "This one divides it by four". OK, so you can fix them later, like any code defect, but the ideal is "works first time". So to me it's a good trade-off to anticipate the predictable defects, as long as it's not a great contortion. I don't bother speculatively hoisting vector::size(), because I know that's fast. countPrimesLessThan is expected to be slow.
–
Steve JessopDec 31 '09 at 17:34

Remember 80-20 Rule.(80% of execution time is spent on 20% critical code in the program)
There is no meaning in optimizing the code which have no significant effect on program's overall efficiency.

One should not bother about such kind of local optimization in the code.So the best approach is to profile the code to figure out the critical parts in the program which consumes heavy CPU cycles and try to optimize it.This kind of optimization will really makes some sense and will result in improved program efficiency.

80/20? My experience is that it is 98% and 2%. (Also, that's not my downvote.)
–
wallykDec 31 '09 at 2:17

1

No, the 80-20 rule is that 80% of the time that someone has to invent a percentage with no basis whatsoever in observation or statistics, they will choose 80%. The other 20% of the time, they will choose 20%.
–
Steve JessopDec 31 '09 at 2:34

80/20 rule is a common name for a pareto distribution.
–
peterchenDec 31 '09 at 10:23

++ I got clobbered for saying what I thought was the same thing :-), maybe less nicely. No doubt I'm old-fashioned, but I spent years writing assembly language, and what I want a compiler to do is write good assembly language for me. I don't want it to try to outsmart me and second-guess the structure of my program. If it thinks such changes will make my code faster, almost 100% of the time it won't, and anyway I would like such decisions to be mine.
–
Mike DunlaveyJan 13 '10 at 22:05