I see the mantra of "profiling before optimization" repeated again and again here, on SO, and elsewhere. Although I certainly use profiling tools, I'm only occasionally surprised by the results. It seems like, as often as not, the profiler is just giving the same information that you could reasonably deduce by knowing the likely execution path of your program, understanding how your architecture works, and having a good idea of what optimization techniques the compiler can employ for you.

Because of this, I generally find that when I'm developing, I see areas of the code that I can sense are going to be bottlenecks (often times when writing the code I will think to myself "hey, this is going to be a critical part of the code, and needs to be fast, but I will use a slower implementation first to prove the concept, then optimize it later") and I just go ahead and optimize these areas before I bother doing much profiling.

Is this really such a bad practice, or is it just the natural result of gaining experience doing optimization?

Edit

I just wanted to point out that I do use profilers, I'm just wondering how bad it's considered to be to pick off the low hanging fruit first, rather than waiting for the profiler to prove something is slow.

Edit 2

Thanks for the insights so far everyone. People have posted a lot of good comments. I have to say that I still have mixed feelings about some of the advice here, but it's given me a lot of insight into why people make the statement.

I don't think that it's a bad practice but it could certainly be a waste of time if the performance is sufficient without the optimizations. Have in mind that profiling is useful anyway, since it can discover things that you didn't expect/think to be a bottleneck.
–
faifMar 31 '11 at 14:46

11

About the only optimization you should do is algorithmic optimization. This function is O(n^3) I bet I can reduce that to O(n.log(N)). But before you do that you must also weight the cost/benefit ratio. If the cost is a weeks worth of work on a routine that gets run once once a week on a data set of 10 items (thus taking 10 minutes) then is it really worth a week of development to reduce that time to 14 seconds? Note: Your cost $3000 per week. Computer cost (non critical thing) 0.0000000000000000000000000000000000000000000000000000000001 cent per second
–
Loki AstariMar 31 '11 at 15:59

@Martin: Please post that as an answer. "Optimization" is often irrelevant, where proper algorithm design is the real issue.
–
S.LottMar 31 '11 at 18:09

1

@Martin: I don't know anybody who intentionally writes a O(too-big) algorithm, but stackoverflow is full of questions by people who want to know why their program seems too slow. I try to share what experience I have in performance tuning. Occasionally (very) the problem is big-O, but much more common is over-designed, over-abstracted data structures, generating call trees far more bushy, and doing far more things, than are really necessary.
–
Mike DunlaveyMar 31 '11 at 23:50

3

Note that most posts on optimization on SO are about micro-optimizations. The infamous ++i vs i++ comes a lot for example. Shifting instead of divisions... People feel themselves SO clever for thinking about it, not even realizing that those are so simplistic that they are built into the compilers already :/ There's nothing wrong about optimizing an algorithm, but micro-optimizations are generally better left to compilers, Peephole optimization is that good. Even worse, code obsfucation may fool the compiler and prevent optimization :/
–
Matthieu M.Apr 1 '11 at 6:31

10 Answers
10

If you are fairly certain a piece of code is going to prove to be a bottleneck, its not a horrible practice to do some initial optimization. At the very least, its a good idea to take steps to make sure it will be easy to refactor later.

What you want to avoid is going overboard by sacrificing time and readability in the name of micro-optimizations before you've seen actual data to justify such an effort.

+1: There has been so much talk about i++ vs ++i it makes me want to punch babies.
–
unholysamplerMar 31 '11 at 14:49

@unholysampler, doesn't that really depend on the processor and what the compiler produces? Those types of optimizations will work on some machines and not others. Kind of silly if you ask me.
–
Berin LoritschMar 31 '11 at 15:00

@unholysampler: There is the fact that ++i will always be at least as good as i++, but the reverse isn't true.
–
David ThornleyMar 31 '11 at 15:05

2

@Berin most (all?) fine-grained optimizations depend on the processor and what the compiler produces. Thats why its not bad to make some obvious, high-level optimizations early on, and wait until after you've done some profiling to do any sort of micro-tuning.
–
dbyrneMar 31 '11 at 15:06

@BerinLoritsch: There are never perfect generalizations, but many compilers can optimize, especially when the result is not being stored anywhere. I am more focused on the fact that there a likely other things you will be able to change that have a much bigger impact than changing how you increment a number.
–
unholysamplerMar 31 '11 at 15:11

The point of profiling before optimizing is that you need a baseline to determine how much improvement the optimization gave you. The "surprising" information I get from my profiler will be along these lines:

I knew that method was called a lot, but I didn't realize it was called that much.

Thread monitor contention almost never slows things down where I expect it.

I'm generating that many instances of XYZ? It should only be about n...

That said, many times the profiler merely confirms my suspicions. Good scientific method involves healthy does of monitoring and experimentation. It's a bit difficult, but I do try to figure out more systemic problems with the profiler. I could do the obvious change and get a 5% improvement, or if I approached the problem differently I might be able to get a 25% improvement. (Most optimizations don't yield such a large improvement, but on occasion they do.) Of course, I wouldn't know how much my optimization improved without a baseline measurement.

Measuring the performance improvement is a good point. There are certainly cases where I know a given optimization will speed things up, but I make the change before gathering before and after data to see how much of an improvement it is.
–
CercerillaMar 31 '11 at 15:46

@CodeninjaTim: Not to mention that, if you're a contractor, presenting the speedup results is a nice way of showing those who have hired you that you haven't been wasting time with that code fiddling.
–
David ThornleyApr 1 '11 at 15:56

The line between "optimizing" and just "sensible design" is sometimes fairly fine, but other times pretty obvious. Just for example, you don't need a profiler to be pretty sure that if you're sorting a few million items, it's worth using an O(N log N) algorithm rather than an O(N2) algorithm. IMO, that just falls under being reasonable sensible, not optimization though.

There are also some things you might as well do, simply because they might provide a benefit, and the cost is minimal to nonexistent. To use @unholysampler's example, writing ++i instead of i++ may have some minuscule cost if you're accustomed to typing it as a post-increment, but (at most) it's temporary and trivial. I wouldn't spend any time rewriting working code for the sake of possibly saving a nanosecond, unless the profiler had shown that I really needed that time, and stood a reasonable chance of saving there. At the same time, when I'm just typing in new code, I'd work at habitually using the form that's likely to be better, because once you do so habitually it's free. It frequently won't gain anything, and even when it makes a difference it often won't be large enough to notice or care about -- but it's still free, so there's not reason not to do it.

Cases like those are fairly obvious, and in my mind would really fall under sensible design rather than what I'd think of as truly optimization. Most other cases of "optimization without representation" would be considerably harder to justify though. If you're going to spend any significant time or effort on the optimization, you should have something much more solid than a "gut feel" to justify it.

I should add that part of the reason I say that is that I think profiling code is extremely useful, even when your goal isn't optimization. A profiler gives a high-level overview of a piece of code that can be extremely useful even when you don't particularly care about optimization at all. Just for example, if you see 10x as many calls to allocate a resource as to free that type of resource, it's a pretty good clue that there's a real problem, even if the code currently seems to run just fine. When you get down to it, a lot of code has a lot of things that should match up (not necessarily 1:1, but somehow or other) and a profiler can show mismatches like that much more quickly than most other tools.

I guess the question is where the line is between "sensible practices" and truly optimization. What about changing floating point operations to fixed point when the target machine doesn't have an FPU, or modifying the way you access elements of an array to exploit locality, or using SSE when your doing millions of additions inside a tight loop?
–
CercerillaMar 31 '11 at 16:11

@CodeninjaTim: Like I said, sometimes it's a fine line -- I don't have a solid answer for all those. Some will depend on your level of experience with the target environment, and the level of performance you expect -- unless you know enough to see an obvious discrepancy, you're probably better off profiling first.
–
Jerry CoffinMar 31 '11 at 16:27

@CodeninjaTim: "where the line" is pretty simple. If you're designing the right algorithm, that's a good thing to do. If your just "tweaking and tuning", stop. Design the right algorithm first. Optimization is very, very rarely needed if you just picked the right algorithm to start with.
–
S.LottMar 31 '11 at 18:10

@CodeninjaTim: "changing floating point operations to fixed point when the target machine doesn't have an FPU." This isn't "optimization". That's architecture. "Modifying the way you access elements of an array to exploit locality." Rarely helpful and then only after you provably have the optimal algorithm and you can prove that it's the central performance problem. "Using SSE when your doing millions of additions inside a tight loop." Again. Only after you can prove the loop is already the optimal algorithm.
–
S.LottMar 31 '11 at 18:13

@S.Lott: Don't get me wrong, I'm all for using the right algorithms as the first step, but it does seem a bit overboard to me to say you need to prove that every algorithm is optimal before you do any optimizations.
–
CercerillaMar 31 '11 at 21:09

The problem with this statement is observer error. Just because you think it may be bad doesn't mean it IS bad. The profiler will give empirical evidence and keep you from spending time in an area that may give no improvement. That's why you should start with the profiler before optimizing. Let the program and profiler PROVE that it is a slow section first.

There are known performance killers and known best practices which avoid them. I don't think you need to profile to determine a cursor is slower in SQL Server than a set-based operation. If you start out knowing this, you will write code that performs better from the start without the need to profile the two options every time you write code.

If you are ajusting existing code, it is better to profile not only so you can be sure that the code you know is inefficient in general is working badly, but it can also show up problems you didn't know you had. I remember profiling one thing where I suspected the stored proc could be optimized (it could) and found out that it being called hundreds of times by the application when it only needed to be called it once.

Additionally, you have the benefit of being able to prove that you did in fact improve performance when you profile. I personally save these before and after figures and use them in my performance appraisal write-ups and in discussing my achievements when job hunting. It's nice to have actual figures to show how good you are at tuning.

I completely agree that you need to go with the profiler-first approach for code you didn't write (or wrote a while ago and don't still have fresh in your mind)- I'm thinking of cases like the cursor example, e.g. write code you know will be slow to prove the concept, then go back and write a more efficient version without profiling.
–
CercerillaMar 31 '11 at 15:51

The reason people say that is because you won't know what needs to be optimized before your profiling is finished, so you could end up wasting a lot of time for no benefits.

Still, if something leaps out at you as a bad idea (e.g. your coworker chooses to use deep recursion instead of a simple iterative loop) you can fix it. But you should be focused on moving forward, not naval gazing on old code. There is a stage in the process where that is appropriate.

I think everyone tries to write good code from the beginning. That sounds like what you're doing.

What I think should be done in the beginning is to just keep the design simple, especially the data structure. Often people start off assuming they need more sophisticated data structure, redundant data, and detailed notification techniques because they are worried about performance. In my experience, those things cause the problem they are supposed to avoid.

In spite of good coding practice and good design, performance problems creep in, and you need to remove them periodically. These are almost never things you could have guessed, and most profilers are not very good at finding them either. What's more, the optimization level of the compiler seldom has any effect on them, because mostly they are not tight compute-bound loops. Mostly they present as innocent-looking (or even invisible) function calls that, if you randomly snapshot the stack, are in the middle of it, and are consuming way more wall-clock time than you ever would have imagined, as shown by how often they appear there.

There are two kind of optimisation that can (relatively) safely be done before profiling.

Algorithmic optimisation: choosing an algorithm with better average (or worst) complexity. This can even (should ?) be done before beginning coding. You'll still have to check that with your real data set the selected algorithm is the correct one, but it is a good idea to start with an algorithm that is expected to fare better, isn't it ?

Data structure optimisation: using a correct way to layout you data, or a data structure with better locality can increase your performance, but it will have impact on the algorithm that can be used, so it is easier to do such an optimisation before coding (and thus you cannot use a profiler if there is no code). For exemple, when programming a video game, it is generally better to use struct-of-array (SoA) instead of array of struct (AoS) to store data as it will benefit from data locality, cache coherency, ...

There is the 90/10 law which states that 90% of the time is spent in 10% of the code and only 10% of the time is spent in 90% of the code. A programmer cannot always anticipate which 10% of the code will cause the most impact if optimised so until you use a profiler to figure this out, any optimisation you do is premature. Better to figure out where the problem is before attempting to solve it, no?

Just because you can't always figure out where the time is being spent doesn't mean you never can.
–
CercerillaApr 1 '11 at 13:59

That 90/10 stuff comes from a model of programming that has no subroutines. Suppose main calls foo 10 times, but only 1 time is necessary, and nearly all the time is spent in foo. So the way to speed up the program is to get rid of the 9 unnecessary calls, not to try to speed up foo, even though that's the 10% of code taking 90% of the time.
–
Mike DunlaveyApr 3 '11 at 21:48

Good programming, programming that makes code simpler and more straightforward doesn't require profiling. Good programming like moving unneeded initializations out of loops does not need anymore justification than the fact that you are improving the quality of the code.

IMHO the only times you need to profile is when you are specifically looking to improve performance. In order to show the improvement you must baseline first, and then show the delta.

Anytime you add complexity under the guise of optimization without showing proof that it was a bottleneck and the code does improve performance; is just doing bad programming.