Parallel

Algorithm Performance

By Victor J. Duvanenko, December 03, 2010

A Reflection on Parts 1-9

Victor Duvanenko is a long-time contributor to Dr. Dobb's. He can be contacted at vduvanenko1@gmail.com

We’ve reached the tenth part of this series on algorithm performance. It seems like a nice arbitrary spot to take inventory and reflect, before continuing on the journey of the next nine. Algorithms that were developed and studied fit into two categories: sequential and parallel.

Sequential Algorithms

We started out by implementing and studying performance of sorting algorithms. Sequential algorithms were first, such as Insertion Sort and Selection Sort, followed by development of In-Place Binary-Radix Sort, In-Place MSD N-bit-Radix Sort, Stable MSD N-bit-Radix Sort and Counting Sort, along with development of Parallel In-Place Radix Sort and Parallel Counting Sort.

To visualize and compare performance of various sorting algorithms, we used 2D logarithmic plots, which enabled comparison across 8 orders of magnitude — from arrays of 10 elements to arrays of 100 Million elements. Exponents in performance characteristics became clearly visible as differing slopes &mdash e.g. O(n2) has a slope of two, whereas O(n) has a slope of one, and O(nlgn) has varying slope that is much closer to one than to two.

The hybrid algorithm approach (similar to introsort and STL sort) was used to improve performance of several algorithms. This approach combined Insertion Sort, which performs well for very small array sizes, with the target sorting algorithm. The combination yielded higher performance than using a pure sorting algorithm. Insertion Sort is especially useful when combined with a recursive divide-and-conquer sorting algorithm to handle small termination cases. Insertion Sort is also in-place, which matches well to in-place algorithms.

In-place MSD N-bit-Radix Sort was developed — a potentially new algorithm. It's a linear-time algorithm of O(n) . The hybrid implementation outperforms STL sort by several times for arrays of 8, 16, 32 or 64-bit integers. It is of comparable performance to Intel's IPP Radix Sort, which is not in-place, requiring an external array of same size as the input array. However, the In-Place MSD N-bit-Radix Sort is not stable (which is not an issue when sorting arrays of numbers, but may be an issue for certain use cases). This algorithm was extended to sort arrays of signed integers. A stable MSD N-bit-Radix Sort was also developed, which sadly traded in-place for stability.

The Counting Sort algorithm was developed. This high performance linear-time O(n) sorting algorithm, with low constant factors and favorable memory access patterns, is limited to cases where the range of values is small (which made it a perfect match for use inside Radix Sort). It was shown to perform well for arrays of 8, or 16-bit numbers, outperforming STL sort by over 20X. Data type adaptive sorting could be developed to detect arrays of 8 or 16-bit elements (or small ranges) and select Counting Sort to harness this level of performance. It was extended to handle arrays of signed or unsigned numbers without performance degradation.

Various input data patterns were used to test algorithm performance, such as incrementing, decrementing, constant, and random values using several algorithms. The quality of these random number generation algorithms was examined visually. Using analysis tools provided by NIST exposed weaknesses in one of the random number generators under certain usage conditions, which resulted in only 4 thousand unique values within 100 million 32-bit values generated, as well as failing most of NIST tests. Algorithm performance sensitivity to input data variation was also explored, showing some algorithms to be oblivious to input data variations, while others (such as STL sort) took full advantage of less variation, accelerating by over two orders of magnitude.

When comparing performance of algorithms, using the worst-case input distribution specific to each algorithm is a powerful evaluation method. Measurements showed that In-Place Radix Sort is sensitive to input data that is constant, but STL sort performs worse with random inputs. Under these worst-case to worst-case comparisons, In-Place MSD N-bit-Radix Sort outperformed STL sort by several times. Yet, it is unknown whether these truly are worst-case inputs for each of the algorithms.

Sorting algorithms from Intel's IPP library (a suite of Radix and other sorts for numeric arrays) as well as STL sort (stable and not) were used for comparison.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!