Counting Sort is a simple and very fast linear time O(n) sorting algorithm, with practical limitations which bound its applicability. When it is applicable, such as sorting arrays of 8-bit or 16-bit integers, it has been shown to significantly outperform other sorting algorithms, such as STL sort() by 22-30X for arrays of 8-bit numbers, and 22-25X for large arrays of 16-bit numbers, as described in Part 3 of this series. (STL sort() is a generic hybrid algorithm of quicksort, Heap Sort, and Insertion Sort.) Counting Sort also outperforms Binary-Radix Sort, N-bit-Radix Sort, Insertion Sort, and Selection Sort for 8-bit and 16-bit numbers. It was used in previous installments of this series to create a high-performance hybrid sorting algorithm combinations that adapted to the data type being sorted, performing better than sorting methods based on a single algorithm.

In this article, I examine Counting Sort in detail, extended to support signed 8-bit and 16-bit numbers, followed by performance optimizations, resulting in superior sorting performance for unsigned and signed 8-bit and 16-bit arrays.

Algorithm

When sorting an array of numbers the Counting Sort algorithm consist of two steps: counting and writing. To sort an array of 8-bit numbers, where each array element can have a value between 0 and 255, for example, Counting Sort uses 256 counters. Each counter stores the number of times that a particular value has been encountered within the array, as the array is scanned from the first element to the last. Thus, counter[0] contains the count of how many zeros are in the array, count[1] holds the count of how many ones are in the array, count[2] has the count of how many twos are in the array, and so on, until count[255] which holds the count of how many 255's are in the array. Once the counts have been collected, by scanning through the array in a single pass, the second stage of the algorithm can proceed. This writing stage starts with the count[0], takes its count value, and stores that many zeros back into the array, starting at the first element of the array. Then it looks at the count[1] value, and stores that many ones into the array, and so on. The result is a sorted array of 8-bit numbers.

// Copyright(c), Victor J. Duvanenko, 2010
// Listing 1
inline void CountSortInPlace( unsigned char* a, unsigned long a_size )
{
if ( a_size < 2 ) return;
const unsigned long numberOfCounts = 256;
unsigned long count[ numberOfCounts ];
for( unsigned long i = 0; i < numberOfCounts; i++ ) // initialized all counts to zero, since the array may not contain all values
count[ i ] = 0; // !!! It should be possible to use array initialization and remove this overhead!!!
// Scan the array and count the number of times each value appears
for( unsigned long i = 0; i < a_size; i++ )
count[ a[ i ] ]++;
// Fill the array with the number of 0's that were counted, followed by the number of 1's, and then 2's and so on
unsigned long n = 0;
for( unsigned long i = 0; i < numberOfCounts; i++ )
for( unsigned long j = 0; j < count[ i ]; j++ )
a[ n++ ] = (unsigned char)i;
}
inline void CountSortInPlace_2( unsigned char* a, unsigned long a_size )
{
if ( a_size < 2 ) return;
const unsigned long numberOfCounts = 256;
// one count for each possible value of an 8-bit element (0-255)
unsigned long count[ numberOfCounts ] = { 0 }; // count array is initialized to zero by the compiler
// Scan the array and count the number of times each value appears
for( unsigned long i = 0; i < a_size; i++ )
count[ a[ i ] ]++;
// Fill the array with the number of 0's that were counted, followed by the number of 1's, and then 2's and so on
unsigned long n = 0;
for( unsigned long i = 0; i < numberOfCounts; i++ )
for( unsigned long j = 0; j < count[ i ]; j++ )
a[ n++ ] = (unsigned char)i;
}

Listing 1

The first portion of the algorithm creates an array of 256 counters and initializes them to zero, using the first for loop. The second for loop scans the array from the first element to the last, taking each array element and using its value as an index into the count[] array, incrementing that particular count. For example, an element of value 3 will increment count[3]. Once the counts have been gathered, the writing step begins. In this step, the outer loop walks through every element of the count array, starting with count[0] . It uses the value of count[0], and writes that many zeros to the array. Then it looks at the value of count[1], and writes that many ones to the array, and so on.

As a concrete example, the following array of unsigned 8-bit values is to be sorted by using the Counting Sort:

0, 1, 1, 3, 1, 3, 3, 0, 0, 0, 1, 3, 3, 1, 0, 3, 1

The algorithm scans the array from the first element on the left to the last element on the right. In this example, counters 0 through 3 are used, and all counters are initialized to zeroes. The first element is a 0, and thus count[0] is incremented. The second element is a 2, and thus count[2] is incremented. This method is continued until all 17 array elements have been processed. At the end of this counting step (the second for loop in Listing 1) the counts would be:

The third for loop (the writing step) starts at count[0] and writes the value of zero five times -- into a[0], a[1], a[2], a[3], and a[4]. Then writes the value of one six times (into a[5] through a[10]), followed by no values of 2, followed by writing the value of three six times (into a[11] through a[16]). The resultant array is:

0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3

which is exactly what was counted: 5 zeros, 6 ones, no twos, and 6 threes. The result is a sorted numerical array. The input array was scanned two times from beginning to end: reading the array during the counting step, and writing the array during the writing step. Thus, the algorithm is O(n), with the amount of processing equal to 2n.

Table 1 and Figure 1 show performance measurements for the Counting Sort along with several other sorting algorithms, such as STL sort(), Insertion Sort, Binary-Radix Sort, and 256-Radix Stable Sort from Part 3. The Counting Sort #2 is the second implementation in Listing 1, which initializes the array to zero (where the compiler inserts code to initialize the local count array).

[Click image to view at full size]

Table 1: Random 8-bit Unsigned Elements .

[Click image to view at full size]

Figure 1

The above measurements show that the Count Sort is very efficient algorithm for 8-bit unsigned numbers and outperforms other algorithms by a wide margin. It outperforms STL sort() for all but the smallest array size, by over 20X in performance for 100K array size and above. At 10K array sizes and above it outperforms Intel's IPP Radix Sort by 20-30%, when count array initialization is used, but lags in performance when the for loop is used for clearing the count array. Insertion Sort is about 4X faster for 10 element arrays, but substantially lags in performance for larger array sizes (by up to 139K times), since it is O(n2) algorithm and the Counting Sort is O(n).

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!