C++ std::sort predicate with templates

There are a couple of sorting algorithms in C++ from std::sort to lamdbas, each tailored to different use cases. As a programmer, you may or may not want to delve into the depths of sort algorithms. That’s a domain left for experts. Luckily, it’s quite easy to get started with sorting data.

Sorting methods in C++

With C++, there is a standardized sort method readily available. It’s called std::sort and works by calling it with a range it should sort, usually a begin-iterator and an end-iterator. Normal use cases are sorting a vector or an array, but any container which implements the RandomAccess-iterator can be sorted. Containers like std::list can’t be sorted with std::sort. std::sort is available in the <algorithm> header.

If you try to sort std::list with std::sort, you’ll get a compile error. The error will be a variation of not being able to use operator - on the std::list iterator type. If you want to sort a list, you’ll have to use std::list::sort.

Using qsort to sort structs or classes will not be covered.

What algorithm std::sort will use, is left to the implementation, but the complexity should not exceed O(N log N) where N is number of elements to sort. Implementations are usually implementing std::sort by combining different sort algorithms with different strengths and weaknesses. One such algorithm is IntroSort.

IntroSort is a hybrid sort algorithm, invented in 1997 by David Musser. The goal of IntroSort is to provide a sorting algorithm, which has a fast average performance and an optimal worst case performance. It combines Quicksort and Heapsort, and combines the good parts from both algorithms. Due to this, IntroSort has an average performance and a worst case performance of O(n log n), where N is number of elements.

In theory, sorting numbers and sorting any data is not different. They use the same algorithm, the same range, and the same method. The only difference is the comparison between elements. But that’s standardized too. The real difference is the method comparing the elements.

To sort a collection of any data (also known as POD, Plain Old Data) with STL we need to tell it how to sort these objects. This chapter will go through many of those methods.

Sorting "Hello World!"

Most tutorials about sorting starts with sorting some numbers. But any container with RandomIterator can be sorted with std::sort. And as it happens to be, std::string implements RandomIterator.

Creating a container like so is straight forward with an initializer_list.

std::vector<some_data> elements = { 0,9,2,7,3,5,7,3 };

When you build and run this code, you’ll get a std::vector with 8 elements of some_data. When you try to sort this using std::sort, you’ll get an error stating the compiler could not find a suitable operator < for comparing two instances of some_data.

There are three ways of solving this,

In-class definition, commonly known as member overload.

Out of class definition, commonly known as non-member overload.

By a predicate, and that predicate can be any method matching the signature of the predicate.

Member operator < overload

The member operator < overload is the simplest and easiest if you’re able to change the class/struct itself. This will not work with other classes/structures, whose you have no control over (be classes from other libraries) or when sharing structs between C and C++.

Defining your own operators, is also known as operator overloading. And in this case, it’s relational operator overloading.

Some may argue one must use friend to overload operators, but that is not true. It’s only necessary to use friend method if and only if the comparison should have access to members declared private in the struct or class.

Using some_data from above, make a method with the following signature in the struct: bool operator<(const some_data & rhs) const. The abbreviation rhs means right hand side, while lhs is left hand side.

Using std::sort have the exact same syntax, and the results are identical.

std::sort(elements.begin(), elements.end());

Sort predicate examples

In a written or spoken language, a predicate is a part of a sentence, which "states something" or "tells something" about a subject (or a thing). In computer science and C++, a predicate is much of the same thing. Given n inputs, it can tell something about the object or objects being compared .

With sorting, a predicate states if object A comes before object B, thus a search predicate must have two arguments of same type.

Continuing the use of some_data above, a search predicate could be the operator <-overload (both member and non-member). However, when using a matching operator < overload in the same namespace as the struct itself, all sort operations will use that overload. If you want to customize the sort order in certain cases, predicates will let you define a case-by-case sort orders.

Pretend you’re writing a system for a car dealership. A car has certain properties like make, model, year and price.

Sorting a car against another car does not make any sense, but sorting cars based on what make and model makes sense, also by year and how much the car costs.

There are a couple ways of sorting the car struct, here are using a lambda or a method.

Lambda as a sort predicate

Using a lambda as a sort predicate is probably the easiest and most efficient way of implementing predicates. The reason behind that logic, is that a lambda is an anonymous method which is implemented at the call site. As a bonus, it’s easier for the compiler to inline and optimize the lambda than a method defined elsewhere. It’s both easier to read and more efficient than the alternate methods.

Using the the car-struct, a c++ predicate lambda for sorting by make and model may be implemented like this.

All three lambdas generate the same sort order and have identical semantics. The difference is how they are compiled to machine code. During my testing in Release builds, the lambdas sortByMakeAndModel and sortByMakeAndModel_v2 got merged into one lambda, while the third, sortByMakeAndModel_v3 got completely inlined.

Inlining in computer science, means that the contents of the method is taken out and inserted at the call site. It’s as if there is no function, only the contents of the method is copied every place the inlined method is used.

Free method (non-member method) sort predicate

A free method (non-member method) is any method not defined in a struct or a class. It behaves much like a lambda, but can also be present in multiple compilation units. For performance reasons, using a non-member method should be inline. We can give the compiler a hint to inline the method by using inline.

Member method sort predicate

Using a member method requires an instance of the class or struct. C++11 introduced std::bind (previously boost::bind). It makes it simpler to bind to a method in a class, but care must be taken when constructing the std::bind object.

We want to use a C++ sort predicate member function in the following class.

While 1, 2 and 3 are the same as before, 4-8 are new and requires some explanation. Item 4 tells the compiler the predicate is an object produced by std::bind, and the predicate when called will use method 5, with instance 6, and put the first argument into placeholder _1 (7) and the second argument into placeholder _2 (8).

Using a free method, std::bind or std::function is more likely to incur a runtime cost, because the sort predicate method to be called is essentially a function pointer. And as with any pointer, it may be changed during runtime. The compiler may be smart and optimize away the pointer, if and only if it can see the pointer will not be changed and can not be changed during runtime. This is a prime example for dynamic binding.

With lambdas, you’re more inclined to get static binding. That means the function to be called may and will be resolved at compile time. It’s not possible to change the static binding during runtime.

Sorting with template predicates

To the std::sort method we simply supply a template predicate! It’s just a method taking 2 arguments of that type and apply our logic to tell if the first argument is lesser than the last argument. With templates it becomes a little more complex, but still it’s very simple when you know how it works.

The method template_predicate_sort is a test method of some sort, showing how to use the templated predicates. The clue is to explicitly tell the compiler what type the predicate should have. In this case, either of type A, B or C.

Sorting std::list

Sorting a std::list is almost the same as sorting a vector or a string, except with std::list the sort method is a member function. Except this difference, the std::list::sort method doesn’t accept a range, and it comes with two overloads. One without any predicate, and one with a predicate. The usage is identical to what was covered earlier.

Given a list of numbers, just call sort() with no arguments to sort by the default operator <.

std::list<int> numbers = { 0,9,1,2,8,4,6 };
numbers.sort();

The output is:

Numbers sorted: 0 1 2 4 6 8 9

The same applies to sorting a struct. For simplicity, we’ll use the car struct from before and the sortYear predicate from before. Using lambdas would be