A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, BFS and -ck.

Tuesday, 11 October 2011

cksort and O(n^2) tipping point

I've spent some time modelling the behaviour of the last cksort I posted here to investigate where and why it becomes O(n^2). It turns out the prescan which creates the nodes ends up scanning a large amount of the data repeatedly when the data is purely random. As mentioned here previously, when there are up to ~1500 unique random entries, it is faster than qsort and then gets slower. When there is already some sorted data or repetition, it remains faster even when the number of entries becomes very large. So I tried to model where the tipping point was to determine when to change the sort mode and it turns out that if it is unable to generate a node which is a decent proportion of the original data, it is likely to only generate yet smaller nodes on each pass. With the help of a bit of trial and error and weak modelling, I found that it would stay ahead of a classic O(n log n) search when it was able to reliably sort the data into 32 nodes or less. There are occasions where it is faster with more nodes, but it is always faster at less. So I integrated a classic merge sort into the sort algorithm and made it switch to classic merge sort should it fall below this threshold. This does not significantly increase the total amount of compares/swaps and then means there algorithm is always O(n log n) bound.

So the latest version of cksort now has some nice properties, being O(n) in certain ideal scenarios, being very adaptive to patterns in either direction or lots of repeated data, yet always being bound by O(n log n) in complexity.

Just about any O(n log n) algorithm could be used beyond the tipping
point, but given the intrinsic nature of cksort uses a merge, it seemed
intuitive to extend that to using classic merge as well. I experimented
with using a heap sort once the data was pre-sorted since it almost
became a heap structure naturally after the pre-sort, but 2 things
became clear: First, the heap sort is actually slower than merge sort on
average. And second, and more importantly, the pre-sort phase was
actually the O(n^2) component in the original sort so it didn't make
sense to complete this phase.

10k samples compared to library qsort (I checked the actual code and it's a highly optimised combination of quicksort and insertion sort). Again bear in mind there is little to no optimisation in the cksort code. Time is in microseconds:

fewunique:

qsort: 452

cksort: 114

random:

qsort: 573

cksort: 883

sorted:

qsort: 449

cksort: 34

reversed:

qsort: 451

cksort: 41

mostly sorted:

qsort: 459

cksort: 58

As you can see above, the random is still slower on cksort (but a heck of a lot faster than earlier implementations), and it seems to stay at about half the speed of qsort no matter how many samples which is good because it means it is bound by O(n log n) complexity. 10,000 samples was chosen because it accurately represented a size where the polynomial nature of an algorithm would be obvious. The reason it's half the speed of qsort is that an unoptimised merge sort does on average twice as much work as an optimised quicksort. I could reasonably just switch to a quicksort at the tipping point, but there are still some corner cases for quicksort where it becomes O(n^2) which mergesort never suffers from, and mergesort is very parallelisable if desired. I was tempted to thread the mergesort component to improve the benchmark numbers, but it would have skewed the results inappropriately.

As a generic sort there are still two issues with cksort. One is the extra space required, and at O(n) auxiliary space it is about as bad as it gets. An in-place algorithm could certainly be developed but would be tricky and require a lot more swaps during the presort phase. The tipping algorithm could be switched to a heap sort instead which works in place. The second disadvantage is that it is not stable (i.e. does not keep original order of same value entries). I'm not sure that's easy to improve on.

Again, I don't expect this will dramatically change the way anyone sorts anything but it's certainly been interesting to play with. I'll keep playing some more :)

EDIT: In response to requests for a range of numbers, here's a set of them for different types of data and number of samples, 10-100 million samples. Random (2^32), fewunique1(4 different values), fewunique2(25 different values), fewunique3(100 different values and > tipping point), mostly sorted(every 5th and 10th value swapped), already sorted and reverse sorted.