Parallel

The Need to Rewrite Established Algorithms

Parallel architectures, like other hardware advances before them, require us to rewrite algorithms and data structures  especially the old standbys that have served us well

A central point of developer wisdom is to reuse code, especially data structures and collections. A few decades ago, it was common for C programmers to write innumerable implementations of linked lists from scratch. The code became almost a muscle memory as you banged it out. Today, such an exercise is more the result of ignoring established and well-tested options, rather than coding prowess. Except in exigent circumstances, writing your own collections has the whiff of cowboy programming.

It's safe to say that, for the most part, you should not be writing your own data structures or basic algorithms (sorts, checksums, encryption, calendars, etc.). However, this principle has a recurring exception that needs to be acknowledged; namely, that advances in hardware must find their way promptly into the implementations of common algorithms.

In a 1996 article on hashing efficiency that I wrote for Dr. Dobb's, I discussed the effects of the then-significant problem of memory latency on the choice of hash table design. Basically, the concern was that every fetch of a hash bucket that was not in cache created a significant performance obstacle as the processor waited for the long memory-fetch cycle. This suggested that on closed hash tables, nonlinear rehashing until an open slot was found was a costly operation. Linear rehashing (finding the closest empty slot) worked better. The problem of memory latency and small caches, in those days, made algorithm and data structure selection a task best completed with care.

The expansion of processor caches changed the issue, especially insofar as algorithms were concerned. Unless you come from a comp-sci background, the terms "cache-aware" and "cache-oblivious" algorithms might be new to you. Implementation of the former tend to uncover the size of the cache on the runtime hardware and then size the data structures and algorithms to minimize memory fetches. Success in this can represent significant performance gains, at the cost of some portability. Some libraries, frequently those provided by processor vendors (such as Intel and AMD, in particular) or specialized development houses, provide these implementations. Intel's Integrated Performance Primitives library, for example, checks the runtime platform characteristics and brings in the right binaries for optimal performance.

For most applications, however, we're dependent on the standard libraries provided with the language. (Intel's IPP library, for example, comes only for native code. Java and .NET are supported only with wrappers.) Language providers eventually do deliver library updates, but the progress can be frustratingly slow and the work uneven. The delivery of Java's support for multithread-friendly collections is a case in point. Scala's multithreaded collections were a major draw because they came at a time when Java's collections did not work well enough.

Not only are better libraries needed, but even within standard libraries, the choice of data structures is becoming more complex. In this excellent article explaining why linked lists are passé, Dr. Dobb's blogger Clay Breshears discusses why trees make a better and more parallel-friendly data structure than the ever-sequential linked list. This is exactly the kind of nuance that should keep us vigilant against lazily accepting a static view of which algorithms and data structures to choose. Everyone knows, don't they, that linked lists are faster than trees? And yet, even this mainstay of obvious logic is now changing beneath our feet.

The imminent era of manycore processors is likely to bring other changes to the fore. I especially expect that sort routines will be dramatically affected. Quicksort will no longer be the default sorting algorithm. The choice of sort will be more carefully matched to the needs of the data and the capabilities of the platform. We already see this on a macro level in the new world of big data. Map-reduce at scale depends upon sorts being done in smaller increments and reassembled through a merge function. And even there, the basic sorting has to be capable of handling billions of data items. In which case, grabbing an early item and making it the pivot element for millions of other entries (Quicksort) can have unfortunate consequences on performance.

Between the proliferation of cores, the rapid expansion of caches, the faster performance of RAM, and the huge increase in memory, the traditional choices of algorithms and data structures are no longer inherently safe or appropriate at all. Once again, selection must be made with considerable care aforethought.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!