Getting rid of those pesky C-macros in C++-code (QSORT)

By , last updated July 9, 2019

My day to day job is being a software developer, and my primary tasks is to seek out and eliminate performance issues with the C++-based software. Early on I spotted some methods taking their sweet time when sorting, thanks to Intel Parallel Studio and their Amplifier module w/Advanced Hotspots. Those methods, which got highlighted where sort-methods using a horrible QSORT macro and a predicate doing one too many things, but that’s another story.

Because the horrible QSORT was embedded in methods, those methods lit up like a Christmas Tree. After spending some time debugging and understanding the predicate, it was straight forward to replace it with std::sort and its typesafe predicate (which also almost always gets inlined, which itself can account for massive speedups).

Then, there were instances where this horrible QSORT macro was used deep within a method counting 200+ lines, and other places?

A quick primer on quicksort

Quicksort is a very efficient partition sort algorithm, and it uses a predicate / comparator to determine if an element should go before an other element. If you mess up the predicate, this algorithm will use a very long time to converge, if it converges at all.

As you can see from the animation, the sort algorithm compares two elements many times. On average, it will compare O(n log n) elements. For each compare, a predicate is called. This predicate must be very efficient and not do anything other than being a predicate (like writing information to the console or disk).

Here is a table of O(n log n) vs O(n^2).

n O(n log n) O(n^2)
1 0 1
10 10 100
100 200 10 000
1 000 3 000 1 000 000
10 000 40 000 100 000 000
100 000 500 000 10 000 000 000
1 000 000 6 000 000 1 000 000 000 000
10 000 000 70 000 000 100 000 000 000 000
100 000 000 800 000 000 10 000 000 000 000 000

With 10 million elements, this comparator / predicate can be called from 70 million times to 100 000 000 million times. That’s a lot. This is not unique to quicksort, but all algorithms taking predicates. When you can use C++ and templates, there is no need to use C-style sort algorithms.

Two major advantages C++ have over C when it comes to sort algorithms;

  1. Is that with C++ you have a generic interface (templates) and,
  2. The predicate itself can take the type of the sorted type, and not void * like most C predicates requires (and casts unboxing trolls everywhere).

How can I then find those QSORT places, which actually cost the most?

With the proper motivation, how can we replace with a more up to date implementation of quicksort?

A simple search through the code will only give me all places where the QSORT macro has been used. It is dangerous to just start hacking away on code and start replacing top to bottom. Maybe the predicate wasn’t that simple and you made a mistake in converting it?

By wrapping the macro in a method, the profiler will light up like a Christmas Tree.

The macro is defined like this:

// QSORT_TYPE - Type of the array
// QSORT_BASE - Pointer to the beginning of array
// QSORT_NELT - Array length
// QSORT_LT   - Comparator / Predicate
#define QSORT(QSORT_TYPE,QSORT_BASE,QSORT_NELT,QSORT_LT) \
// about 200-300 lines of code

A typical usage is like this:

struct someDataType
{
    int id;
    std::string name;
    double value;
};

typedef std::vector<someDataType> someDataTypeVector;
someDataTypeVector someData;
#define SOMEDATACMP(a,b) ((a->value) < (b->value))

QSORT(someDataType, &someData[0], someData.size(), SOMEDATACMP);

How can we wrap this in a method without changing every call site?

With clever use of an other macro and the actual wrapped method!

// Rename the QSORT to QSORT_impl
#define QSORT_impl(QSORT_TYPE,QSORT_BASE,QSORT_NELT,QSORT_LT)

// The actual wrapper, wrapping the original QSORT in a nice function.
template<typename A, typename B, typename D>
void QSORT_wrapper(B * b, const size_t c, const D & d)
{
    // A - type
    // B - array
    // c - length
    // D - predicate / comparator / lambda

    QSORT_impl(A,b,c,d);
}

// Wrap the horrendus QSORT macro in a method, to preserve
// stack space and make it show up in the debugger / profiler.
// Notice how we wrap the predicate in a lambda.
#define QSORT(A,B,C,D)                         \
QSORT_wrapper<A>(B,C,[&](A * a, A * b) -> bool \
{                                              \
    return D(a,b);                             \
})

What happens here is that we replace the original QSORT macro with our own version, which calls the wrapper around the original QSORT macro. This way the method QSORT_wrapper will light up the profiler like a Christmas Tree, showing the hotspots where it is necessary to focus your efforts.

There are no changes in the call sites for this macro. What we do is to simply put the arguments in the correct places, and making a lambda for the D parameter. Using a lambda expression allows both comparators / predicates as a #define and as a method (or lambda expressions).