Because we’re not looking for it, we often don’t notice the potential parallelisms in our programs. I’m not talking about obvious parallel problems like matrix multiplication and graphics rendering pipelines. I mean everyday problems. For example, we’re always weaving independent patterns together to construct our procedures, like this.

We usually don’t think much about the fact that the first two lines C1<-calc_1(N) and Z<-calc_4(M) can probably be switched around or performed in parallel.

And why should we? There’s no annotation in most programming languages to capture the parallelism. The compiler doesn’t want to know.

And what about common algorithms like quicksort. It’s usually taught and implemented as linear, but it’s really a bifurcating parallel algorithm. The merge-sort algorithm involves a massive parallel split at the beginning followed by a hierarchical parallel reduction for the merge. Bubble-sort is a great parallel algorithm. It’s a linear-time sort, provided you have a very long chain of processors, each of which can communicate with its two neighbors.

Linearism is often not inherent. There are parallelisms in many common algorithms. But we don’t notice them much because our programming languages encourage linear thought, and don’t give us a way to capture or annotate parallelisms.

Why is parallel programming so hard? At least it’s supposed to be hard — that’s the myth. Of course there’s also a myth (among non-programmers) that all programming is hard, for everyone except 12 year-olds in movies.

One reason parallel programming is hard is we don’t get any practice. Programmers aren’t used to thinking parallel. Introductory programming is linear programming, and most real-world bread-and-butter programming tasks are don’t require a parallel approach. And frankly, it’s easier to understand and express a linear program, and it’s certainly easier to debug.

And besides, parallel hardware has always been exotic, at least up until now. So there hasn’t been much demand for parallelism, and the tools reflect that. Programming languages are (mostly) linear, described as linear text that becomes a linear token stream. Years of parser and compiler research has focused on grammars to translate that linear token stream into linear machine code.

Programs were linear even before text files. Fortran programs used to be stored on stacks of punch cards.

So our tools and habits encourage linear thinking. We introduce unnecessary constraints and dependencies that break parallelism without even thinking about it.

This is part of an Erlang tutorial built around a tic-tac-toe program. The program is stuffed into one file, called tic.erl and available here. The source file and this tutorial are organized as follows:

Parallel processing

One of the great things about Erlang is how easy it is to experiment with multi-processing. Or multi-threading. In Erlang there’s not much difference because you don’t work with global variables and mutable data structures. So you don’t have to worry about locking. Most of the time.

In Erlang it’s easy and cheap to start a new “process”. Remember an Erlang process is not an operating system process. An Erlang process is much lighter weight. You can have tens of thousands of them running in parallel on your underpowered laptop and not even notice.

Parallel implementation

We can implement this as a function that starts up 9 processes, and then waits for them all to finish and send back results. This is a pretty standard transformative pattern: re-interpreting a map as a scatter to start the processes, and then a gather to bring all the results back home.

This spawns a process, and then waits for its results before spawing the next one. This actually works fine (and is barely slower than the original non-parallel implementation), but it never has more than 9 processes alive (and one running) at a time.

Analysis

So, how did this work? Wellllll, in game space search I mention that given_move_predict_outcome(..) is evaluated 2,653,002 times if you fully predict all outcomes for an empty board. Which means the parallel code above will spawn 2,653,002 Erlang processes, all of which can potentially be running at the same time. Which is a little breathtaking.

And it's a little too much. Erlang on my system can't do it. (It has no problem with the "wrong" implementation given above though. That also spawns 2,653,002 processes, but never more than 9 at a time.)

But the code in tic.erl never predicts an empty board. It only predicts after at least one mark is down. Which means it would never try to spawn more than 308,817 processes.

And, not surprisingly, that's still too many. Erlang chokes.

But you CAN run the above code with two marks (an X and an O) on the tic-tac-toe board. In that case Erlang only has to spawn 30 or 40 thousand processes, all running at the same time. That's still pretty impressive. On my setup I max out at about 80,000 simultaneous processes.

Final thoughts

Working with something like Erlang, where it's easy to spawn thousands of processes, influences how you think and how you experiment. It let's you ask how you can solve a problem using a bunch of communicating agents, or using a fabric of processes interconnected in a specific hardware architecture. It let's you test ideas and gain a clearer understanding of how a parallel solution might work. If I were designing a parallel system that I eventually going to implement in C and MPI/OpenMP/CUDA, I'd consider starting with prototyping and concept checking on an Erlang testbed. It'd help catch mistakes and highlight problems early on.