Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page.
As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management.
In his spare time, he enjoys running, football, contemporary fiction and real ale.

Concurrent Affairs

Published 16 March 2012 11:15 am

I once wrote an editorial, multi-core mania, on the conundrum of ever-increasing numbers of processor cores, but without the concurrent programming techniques to get anywhere near exploiting their performance potential. I came to the.controversial.conclusion that, while the problem loomed for all procedural languages, it was not a big issue for the vast majority of programmers. Two years later, I still think most programmers don’t concern themselves overly with this issue, but I do think that’s a bigger problem than I originally implied.

Firstly, is the performance boost from writing code that can fully exploit all available cores worth the cost of the additional programming complexity? Right now, with quad-core processors that, at best, can make our programs four times faster, the answer is still no for many applications. But what happens in a few years, as the number of cores grows to 100 or even 1000? At this point, it becomes very hard to ignore the potential gains from exploiting concurrency. Possibly, I was optimistic to assume that, by the time we have 100-core processors, and most applications really needed to exploit them, some technology would be around to allow us to do so with relative ease.

The ideal solution would be one that allows programmers to forget about the problem, in much the same way that garbage collection removed the need to worry too much about memory allocation. From all I can find on the topic, though, there is only a remote likelihood that we’ll ever have a compiler that takes a program written in a single-threaded style and "auto-magically" converts it into an efficient, correct, multi-threaded program.

At the same time, it seems clear that what is currently the most common solution, multi-threaded programming with shared memory, is unsustainable. As soon as a piece of state can be changed by a different thread of execution, the potential number of execution paths through your program grows exponentially with the number of threads. If you have two threads, each executing n instructions, then there are 2^n possible "interleavings" of those instructions. Of course, many of those interleavings will have identical behavior, but several won’t. Not only does this make understanding how a program works an order of magnitude harder, but it will also result in irreproducible, non-deterministic, bugs. And of course, the problem will be many times worse when you have a hundred or a thousand threads.

So what is the answer? All of the possible alternatives require a change in the way we write programs and, currently, seem to be plagued by performance issues. Software transactional memory (STM) applies the ideas of database transactions, and optimistic concurrency control, to memory. However, working out how to break down your program into sufficiently small transactions, so as to avoid contention issues, isn’t easy. Another approach is concurrency with actors, where instead of having threads share memory, each thread runs in complete isolation, and communicates with others by passing messages. It simplifies concurrent programs but still has performance issues, if the threads need to operate on the same large piece of data.

There are doubtless other possible solutions that I haven’t mentioned, and I would love to know to what extent you, as a developer, are considering the problem of multi-core concurrency, what solution you currently favor, and why.

Cheers,

Tony.

This post has been viewed 1,303 times.

4 Responses to “Concurrent Affairs”

Munging compilers have been built. From the WikiPedia piece ( http://en.wikipedia.org/wiki/Thinking_Machines_Corporation ):
“The Connection Machine was programmed in a variety of specialized languages, including *Lisp and CM Lisp (derived from Common Lisp), C* (derived from C), and CM FORTRAN. These languages used proprietary compilers to translate code into the parallel instruction set of the Connection Machine.”

Although, these may only have been SIMD conversions. Can’t find anything more detailed. I can’t find a cite, either, for discussions I recall reading about the need for fully generalized conversion compilers needing to do detailed semantic analysis; well, yeah. Did find this academic exercise: http://www.jefftk.com/cs25/lab5/ Claims to be able to compile serial code to parallel.

I remain convinced that databases (on varying O/S) connected to dumb-ish “terminal” devices is the way to go. Engine writers, and systems programmers generally, have dealt with the issue for decades, and appear to be rather good at it. Let client side code do the simple stuff of painting and reading the screen, and let the (fully connected) database take care of the rest.

The one place multi core can seemingly be used without problem is in web apps and other programs that use multiple callbacks to load data asynchronously. Here as long as each call goes to its own thread on its own processor there would not be a problem using multi cores.

In general this can be expanded to having multi cores safely be able to handle any problem dealing with discreet processing tasks. It seems to me it would only be when trying to use multi cores to do a single processing task that the real problems would arise.

The other problem seems to be that we are still thinking about using these to get results for mathematical “there is only one right answer” type of problems. Multi core will really shine when using it with genetic algorithims where having multiple cores change the same memory space is not a problem because the new process will only update the memory space if it has a “better” solution to put there than the current one.