Nibble Sort Denouement

Back in January my nibble sort contest resulted in entries that dramatically exceeded my expectations. Since then I’ve been trying to write up a post explaining the various strategies that people used and since you don’t care about my excuses I won’t tell you them, but I never got it written. However! I want to point everyone to three great blog entries:

The thing that I liked most about this contest is that the fastest entries ended up being way faster than I predicted — mostly because I didn’t understand how excellent the Haswell vector unit is. I’ve sent my standard programming contest prize (Amazon gift certificate covering the cost of a paper copy of Hacker’s Delight) to Alexander and Jeff (Jerome’s non-SIMD entry, alas, arrived a bit after the contest expired). Thanks again everyone!

Final note: as a compiler researcher I think these contest entries could be very useful as a case study in how variance among algorithms and among implementations of algorithms can lead to extreme performance differences. In the long run we need compilers that understand how to automatically switch between different versions such as these. Regular compiler technology won’t cut it, nor will (at present) program synthesis. But it’s a good goal.

One reply on “Nibble Sort Denouement”

My hat remains off to Jerome, who figured out a neat trick I overlooked. (and of course to the SIMD entries!)

I recently got my hands on a 64-bit ARM machine (dragonboard 410c) and ran the nibble sort benchmark there. Interestingly, alexander2 and vetter4 pull ahead of jepler and jerome; jepler is faster than jerome after tweaking its main lookup table to operate 2 nibbles at a time instead of three. I didn’t look whether the same change can be easily done to jerome’s code. My results: https://gist.github.com/jepler/333668bb8a3a4e92580b