There are a couple of sorting algorithms in C++ from std::sort to lamdbas, each tailored to different use cases. As a programmer, you may or may not want to delve into the depths of sort algorithms. That’s a domain left for experts. Luckily, it’s quite easy to get started with sorting data.
Contents
With C++, there is a standardized sort method readily available. It’s called std::sort
and works by calling it with a range it should sort, usually a begin
-iterator and an end
-iterator. Normal use cases are sorting a vector or an array, but any container which implements the RandomAccess
-iterator can be sorted. Containers like std::list
can’t be sorted with std::sort
. std::sort
is available in the <algorithm>
header.
If you try to sort std::list
with std::sort
, you’ll get a compile error. The error will be a variation of not being able to use operator -
on the std::list
iterator type. If you want to sort a list, you’ll have to use std::list::sort
.
Using qsort
to sort structs or classes will not be covered.
What algorithm std::sort
will use, is left to the implementation, but the complexity should not exceed O(N log N)
where N
is number of elements to sort. Implementations are usually implementing std::sort
by combining different sort algorithms with different strengths and weaknesses. One such algorithm is IntroSort.
IntroSort is a hybrid sort algorithm, invented in 1997 by David Musser. The goal of IntroSort is to provide a sorting algorithm, which has a fast average performance and an optimal worst case performance. It combines Quicksort and Heapsort, and combines the good parts from both algorithms. Due to this, IntroSort has an average performance and a worst case performance of O(n log n)
, where N
is number of elements.
In theory, sorting numbers and sorting any data is not different. They use the same algorithm, the same range, and the same method. The only difference is the comparison between elements. But that’s standardized too. The real difference is the method comparing the elements.
To sort a collection of any data (also known as POD, Plain Old Data) with STL we need to tell it how to sort these objects. This chapter will go through many of those methods.
Most tutorials about sorting starts with sorting some numbers. But any container with RandomIterator
can be sorted with std::sort
. And as it happens to be, std::string
implements RandomIterator
.
#include <algorithm>
#include <string>
#include <iostream>
std::string hello{ "Hello World!" };
std::sort(hello.begin(), hello.end());
std::cout << hello << "\n";
Output will be !HWdellloor
. Notice the space in front, which is the first character.
If something can be compared by the less-than operator <
, it can be sorted when the data is in a container supporting random access.
Creating a container with your data is simple enough. C++ is all about ‘pay for what you use’.
Given the following data,
struct some_data
{
some_data() = default;
some_data(int val) : a(val) {}
int a;
};
Creating a container like so is straight forward with an initializer_list
.
std::vector<some_data> elements = { 0,9,2,7,3,5,7,3 };
When you build and run this code, you’ll get a std::vector
with 8 elements of some_data
. When you try to sort this using std::sort
, you’ll get an error stating the compiler could not find a suitable operator <
for comparing two instances of some_data
.
There are three ways of solving this,
The member operator <
overload is the simplest and easiest if you’re able to change the class/struct itself. This will not work with other classes/structures, whose you have no control over (be classes from other libraries) or when sharing structs between C and C++.
Defining your own operators, is also known as operator overloading. And in this case, it’s relational operator overloading.
Some may argue one must use friend
to overload operators, but that is not true. It’s only necessary to use friend
method if and only if the comparison should have access to members declared private
in the struct or class.
Using some_data
from above, make a method with the following signature in the struct: bool operator<(const some_data & rhs) const
. The abbreviation rhs
means right hand side
, while lhs
is left hand side
.
The struct will then look like:
struct some_data
{
some_data() = default;
some_data(int val) : a(val) {}
int a;
bool operator<(const some_data & rhs) const
{
return a < rhs.a;
}
};
A call to std::sort
will then compile and call the inline and overloaded less-than operator.
std::sort(elements.begin(), elements.end());
When you’re not able to change the class or struct itself, a non-member operator <
overload can be used.
Outside of the class, define this method:
bool operator<(const some_data & lhs, const some_data & rhs)
{
return lhs.a < rhs.a;
}
Using std::sort
have the exact same syntax, and the results are identical.
std::sort(elements.begin(), elements.end());
In a written or spoken language, a predicate is a part of a sentence, which "states something" or "tells something" about a subject (or a thing). In computer science and C++, a predicate is much of the same thing. Given n inputs, it can tell something about the object or objects being compared .
With sorting, a predicate states if object A comes before object B, thus a search predicate must have two arguments of same type.
Continuing the use of some_data
above, a search predicate could be the operator <
-overload (both member and non-member). However, when using a matching operator <
overload in the same namespace as the struct itself, all sort operations will use that overload. If you want to customize the sort order in certain cases, predicates will let you define a case-by-case sort orders.
Pretend you’re writing a system for a car dealership. A car has certain properties like make, model, year and price.
struct car
{
std::string make;
std::string model;
int year;
double price;
};
Sorting a car against another car does not make any sense, but sorting cars based on what make and model makes sense, also by year and how much the car costs.
There are a couple ways of sorting the car
struct, here are using a lambda or a method.
Using a lambda as a sort predicate is probably the easiest and most efficient way of implementing predicates. The reason behind that logic, is that a lambda is an anonymous method which is implemented at the call site. As a bonus, it’s easier for the compiler to inline and optimize the lambda than a method defined elsewhere. It’s both easier to read and more efficient than the alternate methods.
Using the the car
-struct, a c++ predicate lambda for sorting by make and model may be implemented like this.
auto sortByMakeAndModel =
[](const car & a, const car & b) -> bool
{
if (a.make < b.make) return true;
if (a.make > b.make) return false;
if (a.model < b.model) return true;
if (a.model > b.model) return false;
return false;
};
There are two more ways of creating a predicate with multiple elements to sort with, one where operator <
is always used and the third where everything is a big boolean expression.
Using only operator <
.
auto sortByMakeAndModel_v2 =
[](const car & a, const car & b) -> bool
{
if (a.make < b.make) return true;
if (b.make < a.make) return false;
if (a.model < b.model) return true;
if (b.model < a.model) return false;
return false;
};
Big boolean expression.
auto sortByMakeAndModel_v3 =
[](const car & a, const car & b) -> bool
{
return
(
(a.make < b.make) ||
(a.make == b.make && a.model < b.model)
);
};
All three lambdas generate the same sort order and have identical semantics. The difference is how they are compiled to machine code. During my testing in Release builds, the lambdas sortByMakeAndModel
and sortByMakeAndModel_v2
got merged into one lambda, while the third, sortByMakeAndModel_v3
got completely inlined.
Inlining in computer science, means that the contents of the method is taken out and inserted at the call site. It’s as if there is no function, only the contents of the method is copied every place the inlined method is used.
std::sort(first, last, sortByMakeAndModel);
std::sort(first, last, sortByMakeAndModel_v2);
std::sort(first, last, sortByMakeAndModel_v3);
The output in any case is:
Sort by make and model
1. Ford Mondeo 2010 (25000)
2. Porsche 911 1999 (65000)
3. Volvo V70 2007 (20000)
4. Volvo V90 2016 (55000)
Sorting by the other variables, year
and price
are straight forward comparison. And during my testing, both methods were completely inlined.
Sort by year.
auto sortByYear = [](const car & a, const car & b) -> bool
{
return a.year < b.year;
};
Used like:
std::sort(first, last, sortByYear);
After sorting by year, the list looks like this.
Sort by year
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)
Sort by price.
auto sortByPrice = [](const car & a, const car & b) -> bool
{
return a.price < b.price;
};
Used like:
std::sort(first, last, sortByPrice);
After sorting by price, the vector looks like this:
Sort by price
1. Volvo V70 2007 (20000)
2. Ford Mondeo 2010 (25000)
3. Volvo V90 2016 (55000)
4. Porsche 911 1999 (65000)
A free method (non-member method) is any method not defined in a struct or a class. It behaves much like a lambda, but can also be present in multiple compilation units. For performance reasons, using a non-member method should be inline. We can give the compiler a hint to inline the method by using inline
.
Sorting the car
structure by year is much like the sortByYear
-lambda.
inline bool sortYear(const car & a, const car & b)
{
return a.year < b.year;
}
Using the non-member method is also almost identical to the lambda.
std::sort(first, last, &sortYear);
And as expected, the cars are sorted by year.
Sort by year, non-member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)
Using a member method requires an instance of the class or struct. C++11 introduced std::bind
(previously boost::bind
). It makes it simpler to bind to a method in a class, but care must be taken when constructing the std::bind
object.
We want to use a C++ sort predicate member function in the following class.
struct car_sort
{
bool sortYear(const car & a, const car & b) const
{
return a.year < b.year;
}
};
There must be an instance of the class available, car_sort sort_instance;
, and the sortYear
method must be public.
Then is it possible to bind to the method.
std::sort(first, last, std::bind(&car_sort::sortYear,
sort_instance, std::placeholders::_1,
std::placeholders::_2));
The syntax may look frightening, but there is a reason for each part. The descriptions are on the following source listing, where each element is on it’s own line.
std::sort( // (1) Call to std::sort
first, // (2) First iterator
last, // (3) End iterator
std::bind( // (4) Predicate, which is std::bind
&car_sort::sortYear, // (5) Pointer to member method
sort_instance, // (6) Instance of struct
std::placeholders::_1, // (7) Placeholder for argument 1
std::placeholders::_2 // (8) Placeholder for argument 2
)
);
While 1, 2 and 3 are the same as before, 4-8 are new and requires some explanation. Item 4 tells the compiler the predicate is an object produced by std::bind, and the predicate when called will use method 5, with instance 6, and put the first argument into placeholder _1
(7) and the second argument into placeholder _2
(8).
The output is still valid.
Sort by year, member method
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)
Using a free method, std::bind
or std::function
is more likely to incur a runtime cost, because the sort predicate method to be called is essentially a function pointer. And as with any pointer, it may be changed during runtime. The compiler may be smart and optimize away the pointer, if and only if it can see the pointer will not be changed and can not be changed during runtime. This is a prime example for dynamic binding.
With lambdas, you’re more inclined to get static binding. That means the function to be called may and will be resolved at compile time. It’s not possible to change the static binding during runtime.
To the std::sort method we simply supply a template predicate! It’s just a method taking 2 arguments of that type and apply our logic to tell if the first argument is lesser than the last argument. With templates it becomes a little more complex, but still it’s very simple when you know how it works.
#include <iostream> // Provides cout
#include <vector> // Provides our basic container
#include <algorithm> // Provides std::sort
// Structure to be sorted
template<typename T>
struct generic_data
{
generic_data(size_t i, size_t t, T data)
: id(i)
, time(t),
other_data(data)
{}
size_t id;
size_t time;
T other_data;
};
using A = generic_data<bool>;
using B = generic_data<std::string>;
using C = generic_data<char*>;
// Our predicate
//
template<typename CLASS>
bool timeSortPredicate(const CLASS &a, const CLASS &b)
{
return a.time < b.time;
}
template<typename CLASS>
bool idSortPredicate(const CLASS &a, const CLASS &b)
{
return a.id < b.id;
}
// Our collection printer
//
template<typename CLASS>
void print(const CLASS &c)
{
for (const auto & item : c)
{
std::cout << "Id: " << item.id <<
", time: " << item.time <<
", data: " << item.other_data << "\n";
}
}
The method template_predicate_sort
is a test method of some sort, showing how to use the templated predicates. The clue is to explicitly tell the compiler what type the predicate should have. In this case, either of type A
, B
or C
.
void template_predicate_sort()
{
// Construct collections by initalizer lists
std::vector<A> alist = {
{1, 15, false},
{2,20, true},
{3,14, false}
};
std::vector<B> blist = {
{3, 4, "first"},
{2, 10, "second"},
{1, 12, "third"}
};
std::vector<C> clist = {
{11, 0, "char literal 1"},
{12, 0, "char literal 2"},
{13, 0, "char literal 3"}
};
// Print stuff
std::cout << "Original collections.\n";
std::cout << "Collection A\n";
print(alist);
std::cout << "Collection B\n";
print(blist);
std::cout << "Collection C\n";
print(clist);
std::cout << "Sorting collections by time\n";
sort(alist.begin(), alist.end(), timeSortPredicate<A>);
sort(blist.begin(), blist.end(), timeSortPredicate<B>);
sort(clist.begin(), clist.end(), timeSortPredicate<C>);
// Print stuff
std::cout << "Collection A\n";
print(alist);
std::cout << "Collection B\n";
print(blist);
std::cout << "Collection C\n";
print(clist);
std::cout << "Sorting collections by id\n";
sort(alist.begin(), alist.end(), idSortPredicate<A>);
sort(blist.begin(), blist.end(), idSortPredicate<B>);
sort(clist.begin(), clist.end(), idSortPredicate<C>);
// Print stuff
std::cout << "Collection A\n";
print(alist);
std::cout << "Collection B\n";
print(blist);
std::cout << "Collection C\n";
print(clist);
}
Compile with:
g++ sortpredicate.cpp -o sortpredicate
The output should be:
Original collections.
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by time
Collection A
Id: 3, time: 14, data: 0
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Collection B
Id: 3, time: 4, data: first
Id: 2, time: 10, data: second
Id: 1, time: 12, data: third
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting collections by id
Collection A
Id: 1, time: 15, data: 0
Id: 2, time: 20, data: 1
Id: 3, time: 14, data: 0
Collection B
Id: 1, time: 12, data: third
Id: 2, time: 10, data: second
Id: 3, time: 4, data: first
Collection C
Id: 11, time: 0, data: char literal 1
Id: 12, time: 0, data: char literal 2
Id: 13, time: 0, data: char literal 3
Sorting a std::list
is almost the same as sorting a vector or a string, except with std::list
the sort method is a member function. Except this difference, the std::list::sort
method doesn’t accept a range, and it comes with two overloads. One without any predicate, and one with a predicate. The usage is identical to what was covered earlier.
Given a list of numbers, just call sort()
with no arguments to sort by the default operator <
.
std::list<int> numbers = { 0,9,1,2,8,4,6 };
numbers.sort();
The output is:
Numbers sorted: 0 1 2 4 6 8 9
The same applies to sorting a struct. For simplicity, we’ll use the car
struct from before and the sortYear
predicate from before. Using lambdas would be
std::list<car> cars = {
{ "Volvo", "V70", 2007, 20000.0 },
{ "Porsche", "911", 1999, 65000.0 },
{ "Ford", "Mondeo", 2010, 25000.0 },
{ "Volvo", "V90", 2016, 55000.0 }
};
// Sort by free method predicate
cars.sort(&sortYear);
// Sorting a list with a lambda as predicate
auto sortYearPredicate = [](const car & a, const car & b) -> bool
{
return a.year < b.year;
};
cars.sort(sortYearPredicate);
The output is:
Cars sorted in a list
1. Porsche 911 1999 (65000)
2. Volvo V70 2007 (20000)
3. Ford Mondeo 2010 (25000)
4. Volvo V90 2016 (55000)
Professional Software Developer, doing mostly C++. Connect with Kent on Twitter.