I've been developing concurrent systems for several years now, and I have a pretty good grasp on the subject despite my lack of formal training (i.e. no degree). There's a few new languages that have become popular to at least talk about lately that are designed to make concurrency easier such as Erlang and Go. It appears that their approach to concurrency echoes my own experience as to how to make systems scalable and take advantage of multiple cores/processors/machines.

However, I find that there are very few tools to help visualize what you intend to do, and verify that you are at least close to your original vision. Debugging concurrent code can be a nightmare with languages that are not designed for concurrency (like C/C++, C#, Java, etc.). In particular, it can be near impossible to recreate conditions that happen readily on one system in your development environment.

So, what are your approaches to designing a system to deal with concurrency and parallel processing? Examples:

How do you figure out what can be made concurrent vs. what has to be sequential?

How do you reproduce error conditions and view what is happening as the application executes?

How do you visualize the interactions between the different concurrent parts of the application?

I have my own answers for some of these, but I'd also like to learn a bit more.

Edit

So far we have a lot of good input. Many of the articles linked to are very good, and I've already read some of them.

My personal experience with concurrent programming leads me to believe you need a different mindset than you do with sequential programming. The mental divide is probably as wide as the difference between object oriented programming and procedural programming. I'd like this set of questions to focus more on the thought processes necessary (i.e. theory) to systematically approach the answers. When providing more concrete answers, it helps to provide an example--something you went through personally.

Goal for the Bounty

Don't tell me what I should do. I already have that under control. Tell me what you do. Tell me how you solve these problems.

Maybe logging could help you with two last questions.
–
Amir RezaeiDec 16 '10 at 22:34

What I'm looking for are people's processes. These are areas where the tools I've been using are inadequate, but can get the job done. I'm less concerned about quoting someone else's article and more concerned about methodology here.
–
Berin LoritschDec 17 '10 at 0:08

10 Answers
10

I've been developing concurrent
systems for several years now, and I
have a pretty good grasp on the
subject despite my lack of formal
training (i.e. no degree).

Many of best programmers I know didn't finish the University.
As for me I studied Philosophy.

C/C++, C#, Java, etc.). In particular,
it can be near impossible to recreate
conditions that happen readily on one
system in your development
environment.

yes

* How do you figure out what can be made concurrent vs. what has to be

sequential?

we usually start with a 1000 miles high metaphor to clarify our architecture to ourselves (firstly) and to others (secondly).

When we faced that problem, we always found a way to limiting the visibility of concurrent objects to non concurrent ones.

Lately I discovered Actors in scala and I saw that my old solutions were a kind of "miniactors", much less powerful than scala ones. So my suggestion is to start from there.

Another suggestion is to skip as many problems as possible: for example we use centralised cache (terracotta) instead of keeping maps in memory, using inner class callbacks instead of synchronised methods, sending messages instead of writing shared memory etc.

With scala it's all much easier anyway.

* How do you reproduce error conditions and view what is happening

as the application executes?

No real answer here. We have some unit test for concurrency and we have a load test suite to stress the application as much as we can.

* How do you visualize the interactions between the different

concurrent parts of the application?

Again no real answer: we design our Metaphor on the whiteboard and we try to make sure there are no conflicts on the architectural side.

For Arch here I mean the Neal Ford's definition: Sw Architecture is everything that will be very hard to change later.

programming leads me to believe you
need a different mindset than you do
with sequential programming.

Maybe but for me it's simply impossible to think in a parallel way, so better design our software in a way that doesn't require parallel thinking and with clear guardrails to avoid crashes between concurrency lanes.

To me is all about the data. Break your data right, and parallel processing is easy. All the problems with retention, deadlocks, and so go away.

I do know that this is not the only way to parallelize, but for me is far the most useful.

To illustrate, a (not-so-quick) story:

I did work on a big financial (stock-market control) system on 2007 through 2009, and the processing volume of the data was very big. To illustrate, all calculations done to 1 single account of a client took about 1~3 seconds on their average workstation, and there were more than 30k accounts. Every night, closing the system was a big pain to the users (usually more than 6 hours processing, without any error-margin for them).

Studying the problem further revealed that we could paralelize the calculations among several computers, but we would still have a huge bottleneck on the old database server (an SQL 2000 server emulating SQL 6.5).

It was pretty clear that our minimum processing packet was the calculation of single account, and the major bottleneck were the database server retention (we could see on the "sp_who"s several connections waiting to do the same processing). So the parallel process went like this:

1) One single producer, responsible for either reading the database or writing on it, sequentially. No concurrency allowed here. The producer prepared a queue of jobs, for the consumers. The database belonged solely to this producer.

2) Several consumers, on several machines. Each one of the consumers received a hole packet of data, from the queue, ready to calculate. Each deqeue operation is synchronized.

3) After the calculation, each consumer sent back the data to an in-memory synchonized queue to the producer, in order to persist the data.

There were several check-points, several mechanisms to assure the transactions were correctly saved (none was left behind), but the whole work was worth of it. In the end, the calculations spread among 10 computers (plus the producer/queue computer) took down the closing time os the whole system to 15 minutes.

Just taking away the retention problems caused by the poor concurrency management SQL 6.5 had gave us a big advantage. The rest was pretty much linear, each new computer added to the "grid" made the processing time down, until we reached the "maximum efficiency" of the sequential read/write operations on the database.

How do you figure out what can be made concurrent vs. what has to be sequential?

That's going to take domain knowledge, there's not going to be any blanket statement here to cover it.

How do you reproduce error conditions and view what is happening as the application executes?

Lots of logging, being able to turn logging on/off/up in production applications in order to catch it in production. VS2010 Intellitrace is supposed to be able to help with this, but I haven't used it yet.

How do you visualize the interactions between the different concurrent parts of the application?

Working in multi-threading environment is tough and needs the coding discipline. You need to follow the proper guideline for taking the lock, releasing lock, accessing global variables etc.

Let me try to answer your question one bye one

* How do you figure out what can be made concurrent vs. what has to be sequential?

Use concurrency for

1) Polling :- need a thread to continuously poll something or send the update on regular basis. (Concepts like heart-bits, which send some data on regular interval to central server to say that I am alive.)

2) The operations which has heavy i/o could be made parallel. The best example is logger. The logger thread could be a separate thread.

3) Similar tasks on different data. If there is some task which happens on different data but very similar in nature, different threads can do this. Best example will be server requests.

And off course many others like this depending on application.

* How do you reproduce error conditions and view what is happening as the application executes?

Using logs and debug prints in the logs. Try to log also the thread id so you can see what is happening in each thread.
One way to produce error condition is to put the deliberate delay (in debug code) in the places where you think the issue is happening, and forcefully stopping that thread. Similar things can be done in debuggers too, but I haven't done it so far.

* How do you visualize the interactions between the different concurrent parts of the application?

Put the logs in your locks, so that you will know who is locking what and when, and who has tried for lock. As I said earlier try to put thread id in the log to understand what is going on in each thread.

This is just my piece of advice which is of around 3 years of working on multithread application, and hope it helps.

I disagree with your statement that C is not designed for concurrency. C is designed for general systems programming and enjoys a tenacity for pointing out critical decisions to be made, and will continue to do so for years to come. This is true even when the best decision might be not to use C. Additionally, concurrency in C is only as difficult as your design is complex.

I try, to the best of my ability, to implement locks with the idea that eventually, truly practical lock free programming might become a reality for me. By locking, I don't mean mutual exclusion, I simply mean a process that implements safe concurrency without the need for arbitration. By practical, I mean something that is easier to port than it was to implement. I have very little formal CS training as well, but I suppose that I'm permitted to wish :)

Following that, most bugs that I encounter become relatively shallow, or so completely mind boggling that I retreat to a pub. The pub becomes an attractive option only when profiling a program slows it down sufficiently to expose additional races that aren't related to what I'm trying to find.

As others have pointed out, the problem that you describe is extremely domain specific. I just try, with the best of my ability to avoid any case that might require arbitration (outside of my process) whenever possible. If that looks like it might be a regal pain, I re-evaluate the option of giving multiple threads or processes concurrent and unserialized access to something.

Then again, throw 'distributed' in there and arbitration becomes a must. Do you have a specific example?

To clarify my statement, C was not designed specifically for and around concurrency. This is in contrast to languages like Go, Erlang, and Scala which were designed explicitly with concurrency in mind. I was not intending to say you can't do concurrency with C.
–
Berin LoritschDec 21 '10 at 15:07

Well, for the verification process, when designing a large concurrent system - I tend to test the model using LTSA - Labelled Transition System Analyser. It was developed by my old tutor, who is something of a veteran in the concurrency field and is Head of Computing at Imperial now.

As far as working out what can and cannot be concurrent, there are static analysers that could show that up I believe, though I tend to just draw scheduling diagrams for critical sections, the same as you would for project management. Then identify sections that perform the same operation repetitively. A quick-route is just to find loops, as they tend to be the areas that benefit from parallel processing.

By running time-consuming
tasks on a parallel “worker” thread,
the main UI thread is free to continue
processing keyboard and mouse events.

Making efficient use of an otherwise blocked CPU

Multithreading is useful when a thread
is awaiting a response from another
computer or piece of hardware. While
one thread is blocked while performing
the task, other threads can take
advantage of the otherwise unburdened
computer.

Parallel programming

Code that performs intensive
calculations can execute faster on
multicore or multiprocessor computers
if the workload is shared among
multiple threads in a
“divide-and-conquer” strategy (see
Part 5).

Speculative execution

On multicore machines, you can
sometimes improve performance by
predicting something that might need
to be done, and then doing it ahead of
time. LINQPad uses this technique to
speed up the creation of new queries.
A variation is to run a number of
different algorithms in parallel that
all solve the same task. Whichever one
finishes first “wins”—this is
effective when you can’t know ahead of
time which algorithm will execute
fastest.

Allowing requests to be processed simultaneously

On a server, client requests can
arrive concurrently and so need to be
handled in parallel (the .NET
Framework creates threads for this
automatically if you use ASP.NET, WCF,
Web Services, or Remoting). This can
also be useful on a client (e.g.,
handling peer-to-peer networking—or
even multiple requests from the user).

If you're not trying to do one of the above you'd probably better think real hard about it.

How do you reproduce error conditions and view what is happening as the application executes?

If you're using .NET and you've written use cases you can use CHESS which can recreate specific thread interleaving conditions which enables you to test your fix.

How do you visualize the interactions between the different concurrent parts of the application?

It depends on the context. For worker scenarios I think of a manager-subordinate. Manger tells the subordinate to do something and waits for status updates.

For concurrent unrelated tasks I think of elevators or cars in separate lanes of traffic.

For synchronization I sometimes think of traffic lights or turn-styles.

How do you figure out what can be made concurrent vs. what has to be sequential?

I would question first whether the application (or component) will actually see a benefit from concurrent processing, or in layman's terms -- where is the bottleneck? Concurrency will obviously not always provide a benefit for the investment it takes to make it work. If it looks like a candidate, then I would work bottom up -- trying to find the largest operation or set of operations that can do its work effectively in isolation -- I don't want to spin up threads for insignificant, cost-ineffective operations -- I'm looking for Actors.

Working with Erlang I've come to absolutely love the concept of using asynchronous message passing and the actor model for concurrency -- it's intuitive, effective, and clean.

An actor is a process that executes a function. Here a process is a lightweight user-space thread (not to be confused with a typical heavyweight operating-system process). Actors never share state and thus never need to compete for locks for access to shared data. Instead, actors share data by sending messages that are immutable. Immutable data cannot be modified, so reads do not require a lock.

The Erlang concurrency model is easier to understand and debug than locking and shared data. The way in which your logic is isolated makes is easy to do testing of components by passing them messages.

Working with concurrent systems this is pretty much how my design worked anyway in any language -- a queue that multiple threads would pull data from, perform a simple operation and repeat or push back onto the queue. Erlang is just enforcing immutable data structures to prevent side-effects and reducing the cost and complexity of creating new threads.

How do you reproduce error conditions and view what is happening as the application executes?

In my experience, the only thing I can have done is given a commitment to tracing / logging everything. Every process / thread needs to have an identifier and each new unit of work needs to have a correlation id. You need to be able to look through your logs and trace exactly what was being processed and when -- there's no magic I've seen to eliminate this.

How do you visualize the interactions between the different concurrent parts of the application?

See above, it's ugly but it works. The only other thing I do is to use UML sequence diagrams -- of course this is during design time -- but you can use them to verify that your components are speaking the way that you want them too.

How do you figure out what can be
made concurrent vs. what has to be
sequential?

First i need to know why should i use concurrency, because i have find out that people gets exited with the idea behind concurrency but not always think about the problem they are trying to solve.

If you have to simulate a real life situation like queues, workflows, etc, you most likely will need to use a concurrent approach.

Now that i know that i should use it, its time to analyze the trade off, if you have lots of proccesses, you may think about communication overhead, but if you have to new, may end up with no concurrent solution ( reanalize problem if so.)

How do you reproduce error conditions
and view what is happening as the
application executes?

I'm no expert on this matter but i think that for concurrent systems this is not the correct approach. A theoretical approach should be choosen, looking for the 4 deadlock requirements on critical areas:

Non preemptiveness

Hold and wait

Motual exclusion

Circular chain

How do you visualize the interactions
between the different concurrent
parts of the application?

I try to first identify who are the participants in the interactions, then how do they communicate and with whom. Finally, graphs and interaction diagrams help me visualize. My good old whiteboard can not be beaten by any other kind of media.

I'll be blunt. I adore tools. I use lots of tools.
My first step is to lay out the intended paths for flow of state. My next step is to try and figure out if it's worth it, or if the required flow of information will render the code serial too often. Then, I'll try and draft some simple models. These can range from a stack of crude toothpick sculptures to some simple similar examples in python. Next, I look through a couple of my favorite books, like the little book of semaphores, and see if someone's already come up with a better solution to my problem.

Then I start coding.
Just kidding. A bit more research first. I like to sit down with a fellow hacker, and walk through an expected execution of the program at a high level. If questions come up, we step to a lower level. It's important to find out if someone else can understand your solution well enough to maintain it.

Finally, I start coding. I try to keep it very simple first. Just the code path, nothing fancy. Move as little state as possible. Avoid writes. Avoid reads that may conflict with writes. Avoid, above all else, writes that may conflict with writes. It's very easy to find that you have a positively toxic number of these, and that your beautiful solution is suddenly little more than a cache-thrashing serial approach.

A good rule is to use frameworks where-ever you can. If you're writing basic threading components yourself, like good synchronized data structures, or god-forbid, actual synchro-primitives, you are almost certainly going to blow your whole leg off.

Finally, tools. Debugging is very hard. I use valgrind\callgrind on linux in conjunction with PIN, and parallel studios on windows. Do not try and debug this stuff by hand. You probably can. But you'll probably wish you hadn't. Ten hours mastering some powerful tools, and some good models will save you hundreds of hours later.

Above all else, work incrementally. Work carefully. Do not write concurrent code when tired. Do not write it while hungry. In fact, if you can avoid it, simply do not write it. Concurrency is hard, and I have found that many apps that list it as a feature often ship with it as their only feature.