Sort Like a Human Boss

This post will discuss the ruby gem
sort_authority and the problem it
aims to solve, which is how to quickly sort mixed number and letter strings in
Ruby in a "human" way.

The problem is easiest to explain with images.

Which would you rather see in an application?

This?

or this?

As Jeff Atwood noted in this blog post
from way back in 2007, this is the difference between "ASCIIbetical" and
alphabetical sorting. When creating an interface for humans with listings
containing a mix of letters and numbers you almost always want the latter and
not the former. Yet real alphabetization (also refered to as "natural" or
"human" sorting) is typically not included in standard libraries, Ruby's
included.

As the developer involved in implementing natural sorting for our app at
WegoWise I first turned to the solution presented
here, which we'll call
sensible_sort. This algorithm works fine1, in that it does in fact sort
strings alphabetically. There are also plenty of gems we could have used as
well, including naturalsort,
naturally2, and
natcmp.

Any of these approaches would have gotten the job done, but my coworker
Nathan took one look at the pull request and asked
me how well the algorithm performed. I shrugged3; I hadn't worried about
performance because the lists I was sorting were so small. Then I signed off
for the day.

By the next morning Nathan had thrown together some benchmarks that showed that
sensible sort could be fairly slow given a long enough list. The other Rubygems
didn't perform very well either, compared to Ruby's native ASCIIbetical sort
and a C algorithm by Martin Pool called
natsort. In fact,
Nathan had gone even further and made a wrapper for natsort using FFI,
labeled strnatcmp.c below. Behold the damning numbers:

Sorting ['x1'] * 100_000:

user system total real

sort 0.000000 0.000000 0.000000 ( 0.001758)

strnatcmp.c 0.030000 0.000000 0.030000 ( 0.029083)

naturalsort gem 0.060000 0.000000 0.060000 ( 0.061879)

naturally gem 0.990000 0.020000 1.010000 ( 1.009634)

natcmp gem 1.140000 0.010000 1.150000 ( 1.141950)

sensible_sort 2.280000 0.000000 2.280000 ( 2.286274)

Considering that we're talking about 100,000 items maybe 2 seconds isn't that
bad. We don't have user-facing lists that have 100,000 elements... yet. Also,
I had developed a sentimental attachment to sensible_sort and decided to
write a more realistic benchmark involving randomized strings (gist here),
resulting in the following numbers:

Sorting 100,000 strings like this: "9 3q1q05_xsfw87i6wf7"

user system total real

sort 0.110000 0.000000 0.110000 (0.112114)

strnatcmp.c 0.560000 0.000000 0.560000 (0.560879)

naturally gem 7.940000 0.020000 7.960000 (7.965125)

sensible_sort 21.230000 0.160000 21.390000 (21.406591)

natcmp gem 51.150000 0.190000 51.340000 (51.364688)

naturalsort gem 161.590000 1.200000 162.790000 (162.883611)

(This benchmark was made a while ago on Ruby 1.9.2 but results hold for later

Rubies)

I was pleased to note that sensible_sort is not the worst approach in
this benchmark. But it's also not very good. 21 seconds is pretty much
unacceptable for an operation you might want to do inside a page request.
Again, we don't often have to sort 100,000 items for humans these days, but
this is an extension on the core Enumerable module that might get used in
various contexts throughout the application. If we can avoid an inherently slow
approach in favor of a painless alternative, why wouldn't we?

And that's how the
sort_authority4
gem came about. sort_authority is just a simple wrapper around strnatcmp.c,
plus an extension on Enumberable to add #natural_sort and #natural_sort_by
methods on arrays and such. If you care about sorting lists in a sane order
(and you should) and want it to be almost as fast as regular sort (why not?)
then you should give it a try.

Although some changes involving type coercion were required to update the code for Ruby 1.9+, available in this gist).↩

Which has the distinction of being one of the least googleable Rubygems.↩

Well, our team is mostly remote so this communication took place in the form of emoticons and animated gifs in our chat room, but same idea.↩