Thursday, October 22, 2009

Today I was at The Strange Loop conference. The first talk I attended was by Dean Wampler on functional programming in Ruby. Dean brought up the Actor concurrency model and how it could be done in Ruby. Of course I am quite familiar with this model in Scala, though it is copied from Erlang (which copied it from some other source I'm sure.) The next talk I went to was on software transactional memory and how it is implemented in Clojure. I had read about Clojure's STM in Stuart Halloway's book, but I must admit that it didn't completely sink in at the time. I only understood the basic idea (it's like a database transaction, but with retries instead of rollbacks!) and the syntax. As I've become more comfortable with Clojure in general, the idea has made more and more sense to me.

Now of course, no one size fits all. There are advantages and drawbacks to any concurrency model. A picture of these pros and cons kind of formed in my head today during the STM talk. I turned them into pictures. Here is the first one:

This is meant to be a measure of how easy it is to write correct concurrent code in Java, Scala, and Clojure. It is also meant to measure how easy it is to undersand somebody else's code. I think these two things are highly correlated. I chose to use these various languages, though obviously this is somewhat unfair. It wold be more accurate to say "locking/semaphores" instead of Java, the Actor model of Scala, and software transactional memory instead of Clojure -- but you get the point. So what does this graph mean?
Well obviously, I think Java is the most difficult language to write correct code. What may be surprising to some people is that I think Clojure is only a little simpler. To write correct code in Clojure, you have to figure out what things need to be protected by a dosync macro, and make sure those things are declared as refs. I think that would be an easy thing to screw up. It's still easier than Java, where you have to basically figure out the same things, but you must also worry about multiple lock objects, lock sequencing, etc. In Clojure you have to figure out what has to be protected, but you don't have to figure out how to protect it -- the language features take care of that.
So Clojure and Java are similar in difficulty, but what about Scala and the Actor model? I think this is much easier to understand. There are no locks/transactions. The only hard part is making sure that you don't send the same mutable object to different actors. This is somewhat similar to figuring what to protect in Clojure, but it's simpler. You usually use immutable case classes for the messages sent between actors, but these are used all over the place in Scala. It's not some special language feature that is only used for concurrency. Ok, enough about easy to write/understand code, there are other important factors, such as efficiency:

Perhaps this should really be described as memory efficiency. In this case Java and the locking model is the most efficient. There is only copy of anything in such a system, as that master copy is always appropriately protected by locks. Scala, on the other hand, is far less efficient. If you send around messages between actors, they need to be immutable, which means a lot of copies of data. Clojure does some clever things around copies of data, making it more efficient than Scala. Some of this is lost by the overhead of STM, but it still has a definite advantage over Scala. Like in many systems, there is a tradeoff between memory and speed:

Scala is the clear king of speed. Actors are more lightweight than threads, and a shared nothing approach means no locking, so concurrency can be maximized. The Java vs. Clojure speed is not as clear. Under high write contention, Clojure is definitely slower than Java. The more concurrency there is, the more retries that are going on. However, there is no locking and this really makes a big deal if there are a lot more reads than writes, which is a common characteristic of concurrent systems. So I could definitely imagine scenarios where the higher concurrency of Clojure makes it faster than Java. Finally, let's look at reusability.

By reusability, I mean is how reusable (composable) is a piece of concurrent code that you write with each of these languages/paradigms? In the case of Java, it is almost never reusable unless it is completely encapsulated. In other words, if your component/system has any state that it will share with another component, then it will not be reusable. You will have to manually reorder locks, extend synchronization blocks, etc. Clojure is the clear winner in this arena. The absence of locks and automatic optimistic locking really shine here. Scala is a mixed bag. On one hand, the Actor model is very reusable. Any new component can just send the Actor a message. Of course you don't know if it will respond to the message or not, and that is a problem. The bigger problem is the lack of atomicity. If one Actor needs to send messages to two other Actors, there are no easy ways to guarantee correctness.
Back in June, I heard Martin Odersky says that he wants to add STM to Scala. I really think that this will be interesting. While I don't think STM is always the right solution, I think the bigger obstacle for adoption (on the Java platform) is the Clojure language itself. It's a big leap to ask people to give up on objects, and that is exactly what Clojure requires you to do. I think a Scala STM could be very attractive...

21 comments:

STM is one of three concurrency control mechanisms in Clojure. STM is for synchronous, coordinated state. Agents are a more actor-like mechanism for asynchronous and independent state. Atoms are somewhere between and are used for synchronous, independent state.

I think Clojure and Scala will (fairly quickly) pick up useful concurrency idioms from each other. Certainly Clojure already has several more choices than STM.

Another important axis to consider is emphasis on/support for persistent data structures. Java does poorly here, and whether Clojure or Scala wins depends on whether you think supporting in-place mutation is a good idea.

Have you seen Rich's video on persistent data and managed references at http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey? I would be curious to know how/if this influences your assessment.

"To write correct code in Clojure, you have to figure out what things need to be protected by a dosync macro, and make sure those things are declared as refs. I think that would be an easy thing to screw up."

IMHO in Clojure it is exactly the opposite because its easy to do the right thing and hard to screw up:

- If the data is immutable, there is no reason to protect it

- If the data is mutable, you _must_ put the data in a ref

- Once the data is in a ref, you are only able to modify it in a transaction

Therefore Clojure guarantees that data is only mutated in a transaction and your are not able to to step out of this!

For your speed comparison chart, you'd be better off comparing Clojure's agents to Scala's actors. Comparing on that level, I'd imagine Clojure would actually be the fastest because "Clojure's agents are reactive not autonomous. There's no imperative message loop and no blocking receive."

In regard to the difficulty of getting things right, Scala uses objects everywhere which are riddled with mutable state. If you forget to protect access via actors, you're going to run into problems. In contrast, Clojure's native types are all immutable, so the same danger doesn't exist.

This being said, I did enjoy the article, but I feel like a little more research should have been done ahead of time.

Hi Michael,I apprecie this article but I not understand your difference In "Efficiency" and "Speed": I thin kthat a speed language is efficient too.Scala requires the copy of data between object with the actor's model: it's a very inefficient operation. How it's possibile Scala is speedest language?

I'm curious about your speed comparison. Did you benchmark the three approaches and compare how they perform a similar operation, or are those placements on the axis based on your own assumptions and reasoning? Your line of reasoning makes sense to me, but I am skeptical about your conclusions in the absence of hard numbers. In particular I've read that Scala's actors are quite slow. If you *do* have any benchmarks, or know where we can find some, please let us know!

We are a third party technical support service. Avast Customer Support is here to help you out with the whole procedure to Download Avast Antivirus online, We not only fix your Avast Support related issues but will guide with how to get started with your new Avast product once it gets installed successfully.We at Avast Tech Support provides service to protect your PC from potential online threats and external attacks like viruses, Trojans, malwares, spywares and phishing scams. And Avast Refund. Call on our Avast Phone Number.

Gmail Customer service is a third party technical support service for Gmail users when they face any technical issue or error in their Gmail account. Our Gmail Customer Support team solves issues like forgot Gmail account password, Gmail configuration or Sync issues, recover deleted emails and many more. Toll Free number (800) 986-9271How you install or reinstall Office 365 or Office 2016 depends on whether your Office product is part of an Office for home or Office for business plan. If you're not sure what you have, see what office com setup products are included in each plan and then follow the steps for your product. The steps below also apply if you're installing a single, stand-alone Office application such as Access 2016 or Visio 2016. Need Help with office setup Enter Product Key? Call 1-800-000-0000 Toll FreeNorton Tech Support is a third party service provider and not in any way associated with Norton or any of its partner companies. We offer support for Norton products and sell subscription based additional warranty on computer and other peripheral devices. Call our Toll Free number 1 855 966 3855Other Services