10 reasons my O(n²) algorithm is better than your O(n) algorithm

OK, so it’s a controversial title, but the point I want to make is that
programmers often get confused about the time complexity, i.e. order O()
of an algorithm and performance. They will happily declare that their
O(n) algorithm is so much better than someone else’s O(n²) algorithm as
they rip out some simple, stable, working code and replace it with their
vastly more complex, un-tested, but ‘optimized’ code, often without even
measuring to see if there is in fact any performance gain. The fact is
that O() notation refers to the asymptotic performance of an algorithm
and that in a typical environment with typical data you aren’t going to
be anywhere close to asymptotic behavior. Performance is, in fact,
influenced by many other factors besides the time complexity of an
algorithm.

So what factors can make an O(n²) algorithm better than an O(n)
algorithm?

​1. It might take a very large N before the O(n²) algorithm is slower
than the O(n) algorithm. And we aren’t talking n=10, n=100, … here,
but perhaps millions or billions before the cross-over in performance is
achieved.

​2. O() refers to time-complexity not to how long the algorithm actually
takes to execute on a particular CPU. For that you need to look at the
mix of instructions used and how long each of those instructions takes.
But that in itself is a gross-simplification as we shall see next …

Even if the two algorithms have a very similar number of operations
using similar CPU instructions, they still are not going to run at the
same speed:

​3. The compiler might be able to apply optimizations to one algorithm
that it cannot apply to the other, dramatically boosting performance.

​4. The CPU itself might be able to optimize the instruction execution
sequence through pipelining or other techniques to dramatically boost
performance. A dumb scan of an array for example can be accomplished
really quickly by most CPUs because they have been optimized to handle
such frequent, repetitive tasks.

​5. A simpler O(n²) algorithm might have tighter loops in it than the
more-complex O(n) algorithm allowing it to run entirely in on-chip
memory, dramatically boosting performance.

Even if two algorithms have similar operations and similar raw CPU
performance they may have different space requirements. The O(n)
algorithm might require 10x as much data space to run or maybe a
different order of space requirement to the other, e.g. O(n²) space
requirement vs O(n) space requirement.

​6. The algorithm that requires least data space in the inner loop may
find that its data is being held in registers or on-chip RAM during
execution and thus receive a significant performance boost.

​7. The more space hungry algorithm might have its data paged to disk
during operation incurring a huge additional time penalty.

Some algorithms are inherently more parallelizable than others:

​8. As we move to a world of multi-core processors it’s going to be more
important to be able to parallelize code to get performance gains of 4x,
8x, … PLINQ and other parallel extensions in modern languages can
easily be applied to ‘dumb’ nested for-loops to gain significant
improvements in performance. Applying those same speedups to your
‘optimized’ tree-based code might be much harder or even impossible.

‘Better’ is a very subjective word, better performance is just one of
many issues

​9. The better algorithm is probably the one that’s easiest to write,
easiest for others to read, easiest to test and easiest to maintain. A
simple nested for-loop may be 1000x better than some O(n) algorithm when
best is defined like this.

​10. Even if your algorithm beats mine with all of these factors taken
into account that’s no guarantee it will do so in the future. Compilers
evolve, CPUs get smarter and much of that effort is devoted to making
dumb code faster.

Conclusion

Premature optimization is always a bad idea.

Many ‘optimizations’ can actually lead to worse not better performance.

Always measure, then optimize, then measure again.

It’s often cheaper to buy another server than it is to have someone
‘optimize’ your code.

O(n) is a much misunderstood topic

On modern servers space complexity is nearly always more important than
time complexity these days. If you can run in on-chip RAM you have a
huge advantage.