In a standard algorithms course we are taught that quicksort is O(n log n) on average and O(n²) in the worst case. At the same time, other sorting algorithms are studied which are O(n log n) in the worst case (like mergesort and heapsort), and even linear time in the best case (like bubblesort) but with some additional needs of memory.

After a quick glance at some more running times it is natural to say that quicksort should not be as efficient as others.

Also, consider that students learn in basic programming courses that recursion is not really good in general because it could use too much memory, etc. Therefore (and even though this is not a real argument), this gives the idea that quicksort might not be really good because it is a recursive algorithm.

Why, then, does quicksort outperform other sorting algorithms in practice? Does it have to do with the structure of real-world data? Does it have to do with the way memory works in computers? I know that some memories are way faster than others, but I don't know if that's the real reason for this counter-intuitive performance (when compared to theoretical estimates).

Quicksort reputation dates from a time when cache didn't exist.
–
AProgrammerMay 29 '12 at 9:20

9

"why does quicksort outperform other sorting algorithms in practice?" Sure that's true? Show us the real implementation you are refererring to with this statement, and the community will tell you why that specific implementation behaves the way it does. Everything else will lead to wild guessing about non-existent programs.
–
Doc BrownMay 29 '12 at 9:42

1

@DocBrown: Many Quicksort (or variants of it) implementations are chosen in many libraries, arguably because they perform best (I would hope so, that is). So there might just be something about the algorithm that makes Quicksort fast, independently of the implementation.
–
RaphaelMay 29 '12 at 10:15

1

Someone has to say this for completeness, so I will: Quicksort is not (usually) stable. For this reason, you may not want to use it. Also, for this reason, your default sort may not be a Quicksort even when that is what you want.
–
RalphChapinMay 29 '12 at 14:18

1

@Raphael: Often what is called quick sort is actually some variation like intro sort (used, afaik, in the C++ standard library), not pure quick sort.
–
GiorgioMay 29 '12 at 17:20

6 Answers
6

I wouldn't agree that quicksort is better than other sorting algorithms in practice.

For most purposes, Timsort - the hybrid between mergesort/insertion sort which exploits the fact that the data you sort often starts out nearly sorted or reverse sorted.

The simplest quicksort (no random pivot) treats this potentially common case as O(N^2) (reducing to O(N lg N) with random pivots), while TimSort can handle these cases in O(N).

According to these benchmarks in C# comparing the built-in quicksort to TimSort, Timsort is significantly faster in the mostly sorted cases, and slightly faster in the random data case and TimSort gets better if the comparison function is particularly slow. I haven't repeated these benchmarks and would not be surprised if quicksort slightly beat TimSort for some combination of random data or if there is something quirky in C#'s builtin sort (based on quicksort) that is slowing it down. However, TimSort has distinct advantages when data may be partially sorted, and is roughly equal to quicksort in terms of speed when the data is not partially sorted.

TimSort also has an added bonus of being a stable sort, unlike quicksort. The only disadvantage of TimSort uses O(N) versus O(lg N) memory in the usual (fast) implementation.

Quick sort is considered to be quicker because the coefficient is smaller that any other known algorithm. There is no reason or proof for that, just no algorithm with a smaller coefficient has been found. Its true that other algorithms also have O(n log n) time, but in the real world the coefficient is important also.

Note that for small data insertion sort (the one that is considered O(n2) ) is quicker because of the nature of the mathematical functions. This depends on the specific coefficients that vary from machine to machine. (At the end, only assembly is really running.)
So sometimes a hybrid of quick sort and insertion sort is the quickest in practice I think.

+ Right. Teachers need to be more aware (and I was a teacher) of the fact that constant factors can vary by orders of magnitude. So the skill of performance tuning really matters, regardless of big-O. The problem is, they keep teaching gprof, only because they have to get past that bullet point in the curriculum, which is 180 degrees the wrong approach.
–
Mike DunlaveyMay 29 '12 at 14:50

2

“There is no reason or pro[o]f for that”: sure there is. If you dig deep enough, you'll find a reason.
–
GillesMay 29 '12 at 18:53

@B Seven: to simplify a lot… for an O(n log n) sort algorithm, there are (n log n) iterations of the sorting loop in order to sort n items. The coefficient is how long each cycle of the loop takes. When n is really big (at least thousands), coefficient doesn't matter as much as O() even if the coefficient is huge. But when n is small, coefficient matters – and can be the most important thing if you're only sorting 10 items.
–
Matt GallagherMay 30 '12 at 15:25

4

@MikeDunlavey - a good example is that building the pyramids is O(n) while sorting your photos of them is O(n ln n) but which is quicker!
–
Martin BeckettMay 30 '12 at 15:52

Quicksort does not outperform all other sorting algorithms. For example, bottom-up heap sort (Wegener 2002) outperforms quicksort for reasonable amounts of data and is also an in-place algorithm. It is also easy to implement (at least, not harder than some optimized quicksort variant).

It is just not so well-known and you don't find it in many textbooks, that may explain why it is not as popular as quicksort.

+1: I have run some tests and indeed merge sort was definitely better than quick sort for large arrays (> 100000 elements). Heap sort was slightly worse than merge sort (but merge sort needs more memory). I think what people call quick sort is often a variation called intro sort: quick sort that falls back to heap sort when the recursion depth goes beyond a certain limit.
–
GiorgioMay 29 '12 at 17:13

Interesting, can you lave a reference to a book\site to read more about it? (preferably a book)
–
KahilMay 29 '12 at 22:10

@Martin: you mean about Bottom-Up heapsort? Well, I gave a reference above. If you want a free resource, the german wikipedia has an article about it (de.wikipedia.org/wiki/BottomUp-Heapsort). Even if you don't speak german, I guess you can still read the C99 example.
–
Doc BrownMay 30 '12 at 5:49

You shouldn't center only on worst case and only on time complexity. It's more about average than worst, and it's about time and space.

Quicksort:

has an average time complexity of Θ(n log n);

can be implemented with space complexity of Θ(log n);

Also have in account that big O notation doesn't take in account any constants, but in practice it does make difference if the algorithm is few times faster. Θ(n log n) means, that algorithm executes in Kn log(n), where K is constant. Quicksort is the comparison-sort algorithm with the lowestK.

The question was mainly about the average case. It's clear that the asker understands the meaning of up-to-a-multiplicative-constant. The question is, why does quicksort have such a low K in practice?
–
GillesMay 29 '12 at 18:52

1

@Gilles: it has low K, because it's a simple algorithm.
–
vartecMay 30 '12 at 9:10

4

WTF? This doesn't make any sense. The simplicity of an algorithm has no relation with its running speed. Selection sort is simpler than quicksort, that doesn't make it faster.
–
GillesMay 30 '12 at 10:15

1

@Gilles: selection sort is O(n^2) for any case (worst, average and best). So it doesn't matter how simple it is. Quicksort is O(n log n) for average case, and among all algos with O(n log n) avg it's the simplest one.
–
vartecMay 30 '12 at 10:22

1

@Gilles: other things being equal, simplicity does aid performance. Say you're comparing two algorithms that each take (K n log n) iterations of their respective inner loops: the algorithm that needs to do less stuff per loop has a performance advantage.
–
comingstormMay 31 '12 at 16:35

Quicksort is often a good choice as it is reasonably fast and reasonably quick and easy to implement.

If you are serious about sorting large amounts of data very quickly then you are probably better of with some variation on MergeSort. This can be made to take advantage of external storage, can make use of multiple threads or even processes but they are not trivial to code.

The actual performance of algorithms depends on the platform, as well as the language, the compiler, programmer attention to implementation detail, specific optimization effort, et cetera. So, the "constant factor advantage" of quicksort isn't very well-defined -- it's a subjective judgement based on currently-available tools, and a rough estimation of "equivalent implementation effort" by whoever actually does the comparative performance study...

That said, I believe quicksort performs well (for randomized input) because it is simple, and because its recursive structure is relatively cache-friendly. On the other hand, because its worst case is easy to trigger, any practical use of a quicksort will need to be more complex than its textbook description would indicate: thus, modified versions such as introsort.

Over time, as the dominant platform changes, different algorithms may gain or lose their (ill-defined) relative advantage. Conventional wisdom on relative performance may well lag behind this shift, so if you're really unsure which algorithm is best for your application, you should implement both, and test them.

I guess the "smaller constant" others relate it to is the one in formal analysis, that is on number of comparisons or swaps. This is very well defined but it is unclear how this translates to runtime. A colleague currently does some research on that, actually.
–
RaphaelMay 29 '12 at 17:14

My impression was that it was about generalized performance, but I wouldn't count on either. You're right, though: if your comparison is particularly expensive, you can look up the number of expected comparisons...
–
comingstormMay 29 '12 at 17:56

1

For the reason you state, talking about overall performance (time-wise) is not meaningul in the general case as too many details factor in. The reason for counting only select operations is not that they are expensive, but that they occur "most often" in the Landau-notation (Big-Oh) sense, so counting those gives you your rough asymptotics. As soon as you consider constants and/or runtime, this strategy is much less interesting.
–
RaphaelMay 29 '12 at 18:04

A good implementation of QuickSort will compile such that your pivot values remain in a CPU register for as long as they are needed. This is often enough to beat a theoretically faster sort with comparable Big-O times.
–
Dan LyonsMay 29 '12 at 18:28

Different sort algorithms have different characteristics with respect to the number of comparisons and the number of interchanges they do. And @DanLyons note that a typical sort in a library performs its comparisons via user-supplied functions, and keeping values in registers across lots of function calls is pretty tricky.
–
PointyMay 30 '12 at 21:36