Tools

Algorithm Improvement through Performance Measurement: Part 1

By Victor J. Duvanenko, September 15, 2009

Optimizing sort algorithms for today's CPUs

Correctness

One method for verification of correctness is to compare algorithm implementations to STL sort() to make sure they produce equivalent results, but that assumes STL sort() is correct. To not rely on STL sort() correctness requires implementing a correctness test for sorting algorithms. Correctness requires that array[i] ≤ array[i+1] for all elements of the array, which is simple to implement. Of course, comparison to results from STL sort() would be a nice affirmation of correctness as well. These two tests (stand alone correctness test and comparison to STL sort()) were used to test all implemented routines. Input arrays of size 0 and 1 were also tested.

Table 1 and Figure 1 compare the four sorting algorithms discussed above with varied sizes of arrays filled with random numbers. Each element in the arrays is of type float — 32-bit floating-point.

Table 1

Figure 1

Since the standard C runtime library random number generator function rand() only creates about 32K unique values, for each array element several random numbers were generated. Then they were multiplied together to produce a single value, as shown below:

The reason for the division in the code above is to produce a floating-point number between 0.0 and 1.0, and multiplying several of these numbers still produces a value between 0.0 and 1.0. For each array size the number of unique values was determined, which was always within 0.4% of the number of elements in the array. Without doing this procedure the number of unique values maxed out at RAND_MAX, which is about 32K, and benchmark results of sorting algorithms were severely tainted as some of the algorithms run significantly faster for inputs containing a few unique values. This is not an optimal method of generating random numbers, but is sufficient — better methods will be explored in future articles.

Table 2 and Figure 2 compare the four sorting algorithms using pre-sorted data. For some algorithms this presents the best case for performance.

Table 2

Figure 2

Table 3 and Graph 3 use reverse data (sorted but backwards within the array), as this presents the worst case input for some algorithms.

Table 3

Figure 3

The sorting algorithms being tested are all in-place, which means they operate on the original array provided to them, and thus running the same array multiple times would be an errant measuring technique as the array will be sorted after the first run, with all subsequent runs operating on already sorted array. To measure performance multiple arrays must be used, filled with variety of data, and then operated on by the sorting algorithm. The technique that was used started with 100K arrays, each of 10 elements, followed by 10K arrays, each of 100 elements, followed by 1K arrays, each of 1K elements, and so on, until 1 array with 1M elements.

The plots may seem a little strange at first, since they are logarithmic. The beauty of logarithmic plots, however, is that very large ranges of data can be covered (this is one of the reasons human senses are logarithmic). Both X and Y axes are logarithmic (log10). The slope in the plots relates to exponents by the laws of logarithms:

Thus, on log plots exponent appears as the multiplier on the slope; for example, slope of 1 has an exponent of 1, slope of 2 has an exponent of 2, and slope of 3 has an exponent of 3, and so on. Log plots are a great way to compare linear and exponential behavior across large ranges of data — in the data sets above the exponent range is from -8 to +3, which is 11 orders of magnitude (100 million).

From these plots it's clear that two of the four algorithms (STL sort(), and qsort) have nearly linear performance; for instance, when the array size goes up by 10X, the run time goes up by 10X as well. This behavior is consistent across all three data sets. Selection Sort has double the slope; when array size goes up by 10X, the run time goes up by 100 — clearly showing O(n2) order for this algorithm.

STL sort() consistently outperforms qsort, across all array sizes and all input data sets. The only cases that STL loses are for the presorted input data with Insertion Sort for all array sizes and Selection Sort with array of 10 elements. STL sort() and qsort() perform somewhat independent of data being random, presorted and reverse, but random data set performs several times worse. These measurements imply that performance is not bound by the presorted and reverse data inputs sets as the best case and the worst case inputs, but instead a particular random data input pattern will significantly exacerbate algorithm performance.

Insertion Sort shows O(n2) behavior for random and reverse input data sets, but O(n) for the presorted input data set. When the array is presorted, Insertion Sort keeps up its performance lead no matter the array size. The reason for this is that it never enters the inner for loop, thus not moving any of the array elements. Plus, in this case it does only (n-1) comparisons of elements in total, making its linear O(n2) behavior evident. Reverse data input set is worse than random or presorted input data sets.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!