Threads Don’t Scale, Processes Do

With the multicore explosion in the making, are we going to be running hundreds of thousands of threads in a single program? Erlang programmers would emphatically say, yes! C++ and Java programmers would probably say no.

Why this discrepancy?

Thread model based on heavy-duty OS threads and mutexes has its limitations. You can ask server writers, or Google for “thread per connection” to convince yourself. Servers use thread pools exactly because of that.

Thread pools are an admission of defeat for the thread model. Having to pool threads tells you that:

Thread creation is not fast enough

Threads’ consumption of resources is substantial, so it makes sense to keep their numbers down

Granted, these are technical problems that might be overcome in future by improvements in operating systems.

The more fundamental problem with threads has its root in memory sharing. It seems like sharing offers great advantage in terms of performance, but sharing requires locking. It’s a well known fact that locking doesn’t scale (or compose). Between races and deadlocks, it’s also extremely hard to get right.

Here’s what Erlang does

Erlang gives up on sharing!

Threads that don’t share memory are called processes. We tend to think of processes as heavyweight beasts implemented by operating systems. That’s because one needs the operating system to strictly enforce the no-sharing policy (the isolation of address spaces). Only the OS can manage separate address spaces and the passing of data between them.

But that’s not the only way. The isolation might instead be enforced by the language. Erlang is a functional language with strict copy semantics and with no pointers or references. Erlang processes communicate by message passing. Of course, behind the scenes, messages are passed through shared memory, thus avoiding a large performance hit of inter-process communication. But that’s invisible to the client.

And there is a bonus: Erlang code that uses lightweight processes will also work with heavyweight processes and in a distributed environment.

Why don’t we all switch to Erlang?

As far as I know there are two main reasons:

It’s a weird language. Don’t get me wrong, I love functional programming for its mathematical beauty, terseness, and elegance. But I had to rewire my brain to be able to write pure functional programs. Functional paradigm is as alien to our everyday experience as quantum mechanics. (CS grad students: you’re not typical programmers.)

Messages have to be copied. You can’t deep-copy a large data structure without some performance degradation, and not all copying can be optimized away (it requires behind-the-scenes alias analysis). This is why mainstream languages (and I will even include Scala in this category) don’t abandon sharing. Instead they rely on programmer’s discipline or try to control aliasing.

Conclusions

Native threads are expensive. Interpreted languages, like Erlang, can afford to implement their own lightweight threads and schedulers. I don’t see that happening in compiled languages. The hope is that operating systems will improve their implementations of threads–I hear Linux’s NPTL already is a big improvement in this area. In the meanwhile, we have to rely on thread pools.

Shared memory concurrency model is the reason why multithreaded programming is so difficult and error-prone. Separating address spaces simplifies programming, but at a performance cost. I believe that some kind of compromise is possible. A message-passing or an actor model can work with sharing, as long as aliasing is under control.

Inspiration for this post

After my last post on thin lock implementation, I was asked the question, why I used such a small number, 1024, for the maximum number of threads. It was actually a number I’ve found in the D compiler runtime. A thin lock could easily work with a much larger number of threads. In fact I’m going to substantially increase the number of bits for thread ID at the expense of recursion count. But this question got me thinking about scalability of threading in general.

Fibers

Fibers (as well as Java’s green threads) are not an alternative to heavyweight threads. Fibers can’t run in parallel, so they have no way to use multiple processors. They are more of a flow-of-control construct, often used interchangeably with coroutines.

Share this:

Like this:

Related

18 Responses to “Threads Don’t Scale, Processes Do”

I’ve got an impression that, since all data in Erlang is immutable, interpreter doesn’t need to actually copy messages. It can safely use a shared copy—unless the message is passed between different nodes, i.e. to another system process, or over network.

But I have also got an impression that Erlang doesn’t integrate well. There’s no much sense in writing tools in Erlang, you can’t gradually port an existing system to Erlang and see if it helps. You must build a new system which is a major decision and is hard to prove feasible. I think this is the main reason Erlang is still exotic.

It’s true that data in Erlang is immutable. But every functional language worth its salt tries to optimize the copying behavior. For instance, it might not copy a whole list if you only change one element. So, behind the scenes, it has to keep track of aliasing. And you can’t carry the aliases between threads!

Like most interpreted languages, there are native code compilers available for Erlang. So that aspect of performance isn’t an issue. But I think talk of performance may be a red herring anyway–certainly insofar as the upper-level control structures of a program are concerned. It’s been well established that the performance gains we’ll see in the future will come from parallelism rather than from optimizing a single thread or process, so a massively parallel language like Erlang should be ideally suited to scale over time, and to scale automatically with no changes to the program.

Also, I don’t see much of a problem with copying vs. shared data. If a system must maintain a large amount of data that isn’t feasible to copy then simply make one or more agents a broker for that data. It’s no different than interacting with a database. Of course, this has the potential to create a choke-point for program flow, but it does in a shared-memory program as well. The only real advantage with shared-memory programs in this case comes from fast in-process access to the data, but this is at the expense of program integrity (shared memory means that one badly behaved thread can bring down others as well).

As for syntax issues… any programmer worth her salt should be able to pick up a new language without much fuss. In fact, some universities teach functional languages first, and they are successful with this approach. So I also do not think that functional languages are inherently confusing. As for Erlang specifically, it is far more imperative in its syntax than, say, Lisp, and so should be fairly easy to pick up anyway.

Sean, as far as I know, Erlang native compilers map Erlang processes into Posix threads or Unix processes. That would defeat the advantages of lightweight threading.

As far as performance gains from parallelism go, this is a highly disputed topic. It was a different story when we could speed up any program by increasing processor clock speed. Now you have to re-write programs to take advantage of parallelism, which is far from trivial (see Amdahl’s law). As far as automatic parallelization, it’s an active research topic with positive results few and far between.

I agree that copying is much safer than sharing. In most applications sharing is a premature optimization. I remember when C++ COW strings were all the rage. Then people got used to using them and, when the implementation switched to copying, they didn’t notice the difference.

Having said that, there is a subset of applications where copying is not an option. Multimedia programmers simply cannot afford such an overhead. Being a systems language, D must support sharing. Of course, the safer we can make it the better.

I completely agree that D must support sharing because it is a systems app, and also agree that no language is ideally suited for every problem. However, I’m less clear on whether multiple threads of a multimedia app would actually need to share access to a large chunk of raw data. Most data processing of the kind that I’m aware of uses the scatter/gather method, for which message-passing is an excellent solution. But then I’m not much of a multimedia programmer, so I’m mostly speculating. Is there some multimedia problem that doesn’t decompose well to process-based multiprogramming?

Have you looked at the SharC paper I mentioned a few times? It gives an example of a multi-threaded multi-media application. Essentially, data is acquired from a device (either audio or video) and then goes through several stages of processing, Each stage runs in a separate thread. Large amounts of data are pipelined from stage to stage and copying would be unfeasable (and unnecessary). This setup is characteristic of many (almost-) real-time audio or graphics applications, including games.

Thanks, I’d missed that paper. From what I can see, the program flow it describes is logically a message-passing design. And as I believe you mention above, in-process message passing can actually be implemented via sharing (in fact, I think the Erlang message passing mechanism does this implicitly). However, Id’ forgotten about a related fly in the ointment for functional languages: the fact that data is immutable. Even if Erlang could avoid copying when transferring data between stages, each stage would have to create a new “post-processed” copy of the buffer to pass on to the next stage, and this is probably not ideal given the performance requirements of multimedia apps. I suppose the same issue would apply to D in that pure functions couldn’t be used for multimedia work because they would impose the same requirement.

[…] perfectly feasible to have a concurrent language that disallows sharing–Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional […]

I’m in charge of designing multithreaded support in the D programming language, so I have to study the ways other languages deal with it. The same ideas come back over and over again in various contexts, so there’s bound to be some redundancy.