Two Myths of Sorting Complexity

Let’s begin with an easy question: You have a list of n random floating point numbers, how quickly can you sort them? O(n log(n))? Wrong, you can do better. How about a list of n random strings? O(n log(n))? Actually you’ve probably missed something important there too. How can I say this? Didn’t you learn this in first year computer science? Well, let me explain why I say this.

When the sorting problem is introduced, you typically first look at selection or insertion sort. You soon learn that these algorithms are O(n^2), and very soon you learn that better algorithms, like merge sort and quick sort, can sort in O(n log(n)), but you’ve already glossed over an important assumption. These sorting algorithms are called comparison sorts and they are actually in terms of the running time of a comparison. Now, for numbers like integers and floating point numbers, comparisons are constant time (for a fixed number of bits), but for other data types, like strings, this isn’t the case. So while you can often say that you can sort a list of objects in O(n log(n)), I feel it’s very important to know that this is not always the case.

And this brings me to my other point: You can sort a list of random floating point numbers in better time than O(n log(n)), and it’s very simple: radix sort. Now before you start saying that radix sort only works for integers, you should realise that this just isn’t so. It is actually quite easy to extend the simple radix sort algorithm to floating point numbers. In fact, if you follow a similar process to the linked article, you can sort anything that can be compared in constant time, in O(n) time overall. So while it is true that, because of the large constant factor of radix sort, comparison sort algorithms can give better results in practice (especially if you start parallelising things), radix sort will actually always have a better order complexity for objects that can be compared in constant time.