C++ std::sort predicate with templates

By , last updated May 22, 2017

Sorting structs and classes

If something can be compared by the less-than operator <, it can be sorted when the data is in a container supporting random access.

Creating a container with your data is simple enough. C++ is all about ‘pay for what you use’.

Given the following data,

struct some_data
{
    some_data() = default;
    some_data(int val) : a(val) {}

    int a;
};

Creating a container like so is straight forward with an initializer_list.

std::vector<some_data> elements = { 0,9,2,7,3,5,7,3 };

When you build and run this code, you’ll get a std::vector with 8 elements of some_data. When you try to sort this using std::sort, you’ll get an error stating the compiler could not find a suitable operator < for comparing two instances of some_data.

There are three ways of solving this,

  1. In-class definition, commonly known as member overload.
  2. Out of class definition, commonly known as non-member overload.
  3. By a predicate, and that predicate can be any method matching the signature of the predicate.

Member operator < overload

The member operator < overload is the simplest and easiest if you’re able to change the class/struct itself. This will not work with other classes/structures, whose you have no control over (be classes from other libraries) or when sharing structs between C and C++.

Defining your own operators, is also known as operator overloading. And in this case, it’s relational operator overloading.

Some may argue one must use friend to overload operators, but that is not true. It’s only necessary to use friend method if and only if the comparison should have access to members declared private in the struct or class.

Using some_data from above, make a method with the following signature in the struct: bool operator<(const some_data & rhs) const. The abbreviation rhs means right hand side, while lhs is left hand side.

The struct will then look like:

struct some_data
{
    some_data() = default;
    some_data(int val) : a(val) {}

    int a;

    bool operator<(const some_data & rhs) const
    {
        return a < rhs.a;
    }
};

A call to std::sort will then compile and call the inline and overloaded less-than operator.

std::sort(elements.begin(), elements.end());

Non-member operator < overload

When you’re not able to change the class or struct itself, a non-member operator < overload can be used.

Outside of the class, define this method:

bool operator<(const some_data & lhs, const some_data & rhs)
{
    return lhs.a < rhs.a;
}

Using std::sort have the exact same syntax, and the results are identical.

std::sort(elements.begin(), elements.end());

Sort predicate examples

In a written or spoken language, a predicate is a part of a sentence, which "states something" or "tells something" about a subject (or a thing). In computer science and C++, a predicate is much of the same thing. Given n inputs, it can tell something about the object or objects being compared .

With sorting, a predicate states if object A comes before object B, thus a search predicate must have two arguments of same type.

Continuing the use of some_data above, a search predicate could be the operator <-overload (both member and non-member). However, when using a matching operator < overload in the same namespace as the struct itself, all sort operations will use that overload. If you want to customize the sort order in certain cases, predicates will let you define a case-by-case sort orders.

Pretend you’re writing a system for a car dealership. A car has certain properties like make, model, year and price.

struct car
{
    std::string make;
    std::string model;
    int year;
    double price;
};

Sorting a car against another car does not make any sense, but sorting cars based on what make and model makes sense, also by year and how much the car costs.

There are a couple ways of sorting the car struct, here are using a lambda or a method.

Lambda as a sort predicate

Using a lambda as a sort predicate is probably the easiest and most efficient way of implementing predicates. The reason behind that logic, is that a lambda is an anonymous method which is implemented at the call site. As a bonus, it’s easier for the compiler to inline and optimize the lambda than a method defined elsewhere. It’s both easier to read and more efficient than the alternate methods.

Using the the car-struct, a c++ predicate lambda for sorting by make and model may be implemented like this.

auto sortByMakeAndModel = 
    [](const car & a, const car & b) -> bool
{
    if (a.make < b.make) return true;
    if (a.make > b.make) return false;
    if (a.model < b.model) return true;
    if (a.model > b.model) return false;

    return false;
};

There are two more ways of creating a predicate with multiple elements to sort with, one where operator < is always used and the third where everything is a big boolean expression.

Using only operator <.

auto sortByMakeAndModel_v2 = 
    [](const car & a, const car & b) -> bool
{
    if (a.make < b.make) return true;
    if (b.make < a.make) return false;
    if (a.model < b.model) return true;
    if (b.model < a.model) return false;

    return false;
};

Big boolean expression.

auto sortByMakeAndModel_v3 = 
    [](const car & a, const car & b) -> bool
{
    return 
        (
            (a.make < b.make) ||
            (a.make == b.make && a.model < b.model)
        );
};

All three lambdas generate the same sort order and have identical semantics. The difference is how they are compiled to machine code. During my testing in Release builds, the lambdas sortByMakeAndModel and sortByMakeAndModel_v2 got merged into one lambda, while the third, sortByMakeAndModel_v3 got completely inlined.

Inlining in computer science, means that the contents of the method is taken out and inserted at the call site. It’s as if there is no function, only the contents of the method is copied every place the inlined method is used.

std::sort(first, last, sortByMakeAndModel);
std::sort(first, last, sortByMakeAndModel_v2);
std::sort(first, last, sortByMakeAndModel_v3);

The output in any case is:

Sort by make and model
1. Ford Mondeo 2010 (25000)
2. Porsche 911 1999 (65000)
3. Volvo V70 2007 (20000)
4. Volvo V90 2016 (55000)

Sorting by the other variables, year and price are straight forward comparison. And during my testing, both methods were completely inlined.

Sort by year.

auto sortByYear = [](const car & a, const car & b) -> bool
{
    return a.year < b.year;
};

Used like:

std::sort(first, last, sortByYear);

After sorting by year, the list looks like this.

Sort by year
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Sort by price.

auto sortByPrice = [](const car & a, const car & b) -> bool
{
    return a.price < b.price;
};

Used like:

std::sort(first, last, sortByPrice);

After sorting by price, the vector looks like this:

Sort by price
1. Volvo V70 2007 (20000)
2. Ford Mondeo 2010 (25000)
3. Volvo V90 2016 (55000)
4. Porsche 911 1999 (65000)

Free method (non-member method) sort predicate

A free method (non-member method) is any method not defined in a struct or a class. It behaves much like a lambda, but can also be present in multiple compilation units. For performance reasons, using a non-member method should be inline. We can give the compiler a hint to inline the method by using inline.

Sorting the car structure by year is much like the sortByYear-lambda.

inline bool sortYear(const car & a, const car & b)
{
    return a.year < b.year;
}

Using the non-member method is also almost identical to the lambda.

std::sort(first, last, &sortYear);

And as expected, the cars are sorted by year.

Sort by year, non-member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Member method sort predicate

Using a member method requires an instance of the class or struct. C++11 introduced std::bind (previously boost::bind). It makes it simpler to bind to a method in a class, but care must be taken when constructing the std::bind object.

We want to use a C++ sort predicate member function in the following class.

struct car_sort
{
    bool sortYear(const car & a, const car & b) const
    {
        return a.year < b.year;
    }
};

There must be an instance of the class available, car_sort sort_instance;, and the sortYear method must be public.

Then is it possible to bind to the method.

std::sort(first, last, std::bind(&car_sort::sortYear, 
  sort_instance, std::placeholders::_1, 
  std::placeholders::_2));

The syntax may look frightening, but there is a reason for each part. The descriptions are on the following source listing, where each element is on it’s own line.

std::sort(                 // (1) Call to std::sort
  first,                   // (2) First iterator
  last,                    // (3) End iterator
  std::bind(               // (4) Predicate, which is std::bind
    &car_sort::sortYear,   // (5) Pointer to member method
    sort_instance,         // (6) Instance of struct
    std::placeholders::_1, // (7) Placeholder for argument 1
    std::placeholders::_2  // (8) Placeholder for argument 2
  )
);

While 1, 2 and 3 are the same as before, 4-8 are new and requires some explanation. Item 4 tells the compiler the predicate is an object produced by std::bind, and the predicate when called will use method 5, with instance 6, and put the first argument into placeholder _1 (7) and the second argument into placeholder _2 (8).

The output is still valid.

Sort by year, member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)

Using a free method, std::bind or std::function is more likely to incur a runtime cost, because the sort predicate method to be called is essentially a function pointer. And as with any pointer, it may be changed during runtime. The compiler may be smart and optimize away the pointer, if and only if it can see the pointer will not be changed and can not be changed during runtime. This is a prime example for dynamic binding.

With lambdas, you’re more inclined to get static binding. That means the function to be called may and will be resolved at compile time. It’s not possible to change the static binding during runtime.

Sorting with template predicates

To the std::sort method we simply supply a template predicate! It’s just a method taking 2 arguments of that type and apply our logic to tell if the first argument is lesser than the last argument. With templates it becomes a little more complex, but still it’s very simple when you know how it works.

#include <iostream>     // Provides cout
#include <vector>       // Provides our basic container
#include <algorithm>    // Provides std::sort

// Structure to be sorted

template<typename T>
struct generic_data
{
    generic_data(size_t i, size_t t, T data) 
        : id(i)
        , time(t), 
        other_data(data)
    {}
    size_t  id;
    size_t  time;
    T       other_data;
};

using A = generic_data<bool>;
using B = generic_data<std::string>;
using C = generic_data<char*>;

// Our predicate
//
template<typename CLASS>
bool timeSortPredicate(const CLASS &a, const CLASS &b)
{
    return a.time < b.time;
}

template<typename CLASS>
bool idSortPredicate(const CLASS &a, const CLASS &b)
{
    return a.id < b.id;
}

// Our collection printer
//
template<typename CLASS>
void print(const CLASS &c)
{
    for (const auto & item : c)
    {
        std::cout << "Id: " << item.id << 
         ", time: " << item.time << 
         ", data: " << item.other_data << "\n";
    }
}

The method template_predicate_sort is a test method of some sort, showing how to use the templated predicates. The clue is to explicitly tell the compiler what type the predicate should have. In this case, either of type A, B or C.

void template_predicate_sort()
{
    // Construct collections by initalizer lists
    std::vector<A> alist = {
        {1, 15, false},
        {2,20, true},
        {3,14, false}
    };

    std::vector<B> blist = {
        {3, 4, "first"},
        {2, 10, "second"},
        {1, 12, "third"}
    };

    std::vector<C> clist = {
        {11, 0, "char literal 1"},
        {12, 0, "char literal 2"},
        {13, 0, "char literal 3"}
    };

    // Print stuff
    std::cout << "Original collections.\n";
    std::cout << "Collection A\n";
    print(alist);

    std::cout << "Collection B\n";
    print(blist);

    std::cout << "Collection C\n";
    print(clist);

    std::cout << "Sorting collections by time\n";

    sort(alist.begin(), alist.end(), timeSortPredicate<A>);
    sort(blist.begin(), blist.end(), timeSortPredicate<B>);
    sort(clist.begin(), clist.end(), timeSortPredicate<C>);

    // Print stuff
    std::cout << "Collection A\n";
    print(alist);

    std::cout << "Collection B\n";
    print(blist);

    std::cout << "Collection C\n";
    print(clist);

    std::cout << "Sorting collections by id\n";

    sort(alist.begin(), alist.end(), idSortPredicate<A>);
    sort(blist.begin(), blist.end(), idSortPredicate<B>);
    sort(clist.begin(), clist.end(), idSortPredicate<C>);

    // Print stuff
    std::cout << "Collection A\n";
    print(alist);

    std::cout << "Collection B\n";
    print(blist);

    std::cout << "Collection C\n";
    print(clist);

}

Compile with:

g++ sortpredicate.cpp -o sortpredicate

The output should be:

Original collections.
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by time
Collection A
Id: 3, time: 14, data: 0
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by id
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 1, time: 12, data: third
Id: 2, time: 10, data: second
Id: 3, time: 4, data: first
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3

Professional Software Developer, doing mostly C++. Connect with Kent on Twitter.