I'm currently looking for a new programming language to learn (currently working through some C++, know some C and Python), specifically one that has built-in concurrency support? I want to try to build a large graph library that can do processing across clusters or multiple cores.. I know C++'s Boost library has support for concurrency, but I also want to learn a new language and I'm guessing a language that was designed with concurrency in mind would also be more pleasant to do concurrent programming in. Overall I see this as a chance to learn a new language, learn about concurrent programming, and tackle a big project.

From looking around, Clojure and Scala seem to be the two popular candidates when looking for concurrent programming support.. though I'm not sure how these two compare in terms of

Speed (specifically for concurrent large graph processing)

Community (thinking about pushing this project onto somewhere like GitHub)

Ease of programming concurrently

Or are there other languages I should consider aside from Clojure or Scala?

I have never programmed in a functional language before, but I'm open to learning it.. I've seen one of my friends program in Haskell and Clojure and it looks daunting but I've heard good things about functional programming, esp. for data processing.

We're looking for long answers that provide some explanation and context. Don't just give a one-line answer; explain why your answer is right, ideally with citations. Answers that don't include explanations may be removed.

Clojure has support for STM. I would imagine that's a useful ingredient.
–
Robert HarveyDec 16 '11 at 0:58

5 Answers
5

Scala has an excellent collections library, featuring immutable collections (a good choice for concurrency) and parallel collections (not the same thing as concurrency, but it might help anyway). Mind you, it does have mutable collections as well if you need them.

On the tools and libraries wiki I see a mention to one Menthor, which is supposed to do parallel and concurrent graph processing. It's hosted on Heather Miller's EPFL page, and Heather is one of Scala's committers.

There's also Graph for Scala, a Graph collection which seems to be fairly used. You might collaborate on this project, perhaps, extending it with your needs.

And, of course, JVM has Neo4J, a graph database. If you have to solve distributed graph problems, perhaps you just need Neo4J. And, then, there's Twitter FlockDB, which targets pretty much the same space. Both of them have nice libraries for Scala.

And, finally, we get to Akka. Akka provides support for the actor concurrency model, which is a pretty good one. But it doesn't stop there! It has Clojure-like Agents (which are not like DAI agents), Software Transactional Memory, and many more features.

Personally, I'd recommend looking into Erlang. I don't know too much about it, but when researching concurrent programming, it seemed to pop up quite a bit, so I suspect there's a decent community for it.

Also, it seems that there are a couple of Python modules (greenlet and gevent) that could help you out, if you wanted to go that direction.

When it comes to concurrency, avoiding shared state is key. A few paradigms and languages naturally lend themselves for concurrent programming because of this:

the message-passing paradigm, a core concept in Smalltalk and (AFAIK) Objective-C; the idea being that the building blocks of your program communicate through messages, which naturally manages shared state in a concurrency-friendly way

functional programming in general avoids state; especially when purity is enforcable (after all, a pure function cannot have any shared state at all). Languages to look into in this context are plentiful; popular ones include several LISP dialects (Common Lisp, Clojure, Scheme), the ML family (ML, F#, Haskell), and Erlang. Haskell is especially interesting because it defaults to lazy evaluation and enforces complete purity (that is, you cannot write non-pure functions in Haskell). Haskell, being designed as a compiled language, can also be very performant, as close to C as one can probably get with such a high-level language.

F# is the closest to a functional programming language of all the languages from Microsoft. Functional progamming languages (like Haskell and ML ) by their nature encourage a programming style (idiomatic expressions) that the compiler can easily translate into code that can use multiple cores/CPUs.

D's flagship concurrency model is based on message passing with extremely limited, explicit sharing of mutable data across threads. This model can be used via std.concurrency. What makes this interesting is that it's done in the context of a C-style systems language that allows traditional, imperative programming in the parts of your code that don't deal with concurrency.

Since D is a systems language, the isolation guarantees that std.concurrency provides are enforced partly by the std.concurrency module. The language provides the mechanisms for enforcing these guarantees, such as support for transitive immutability and the shared qualifier. (Immutable data may be freely passed between std.concurrency threads.) The std.concurrency module provides the policies for enforcing isolation, though. Other threading-related modules such as std.parallelism and core.thread don't enforce isolation.

If you're interested primarily in multicore parallelism as a means of increasing performance, you should might want to use the std.parallelism module. If you're more interested in concurrency as an inherent part of the problem (concurrency and parallelism aren't the same thing even though they are related), then std.concurrency may better suit your needs. Additionally, std.concurrency will eventually be extended to work across networks, though it currently only works with threads on a single machine.