Hi, I'm Tony Arcieri. You may remember me from such software projects as Celluloid, Reia, and Cool.io...

Wednesday, September 28, 2011

Object Oriented Concurrency: A Deep Dive into Celluloid (Part 1)

There's a quote from Alan Kay I've often cited as a source of inspiration in my various projects, from Revactor to Reia to Celluloid:

"I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages"--Alan Kay, creator of Smalltalk, on the meaning of "object oriented programming"

This conception of objects likens them to things in the world which are very much concurrent. Every cell in your body is running at the same time, and cells communicate with each other using chemical signals. Computers on a network are all running at the same time, sending each other messages via network protocols.

In Alan Kay's conception of object oriented programming, it should be a natural model for concurrency. However, most languages that describe themselves as "object oriented" today bolt on concurrency as an afterthought, just a thin abstraction layer around the concurrency primitives provided by the operating system, such as threads, conditions, and mutexes. The hardware and OS people went through the herculean effort of abstracting hardware-level concurrency into threads, and from there, for the most part the programming language people haven't done much, at least in mainstream object oriented languages.

The exception lies in the functional language domain, in languages like Haskell, Clojure, and Erlang. Erlang, in part inspired by Alan Kay's language Smalltalk, has abstracted concurrency in such as way that users of the language are able to reason about it intuitively, something which can't be said for most users of threads. The key to Erlang's success, its secret sauce, lies in its message-based concurrency system, a system inspired by Smalltalk.

Smalltalk is generally considered to be the forerunner of all object oriented programming languages, but when it comes to taking to heart Alan Kay's conception of a message-based system, Erlang beats the mainstream object oriented languages hands down. Erlang has also incorporated the idea of communicating sequential processes, first seen in languages like Occam. By combining the ideas of Smalltalk with those of communicating sequential processes, Erlang created a system where concurrency is baked into the core of how the language operates, enabling fully isolated processes that talk to each other with asynchronous messages.

If this idea works so well in Erlang, can it be added to other languages in a way that makes sense?

From Actors to Concurrent Objects

Given Erlang was vicariously inspired by Smalltalk, an object oriented language, is there any reason why Erlang's ideas can't be applied to present-day object oriented languages to give them concurrent objects? The way Alan Kay describes objects they should inherently be concurrent, but the way we're used to working with objects is inherently sequential.

A whole slew of projects, including one of mine, have tried to implement Erlang's concurrency model in Ruby. None of them have been popular. In my opinion, this is because these projects (including my own project Revactor) have stuck far too close to Erlang, trying to write an Erlang DSL in Ruby instead of trying to reimplement these concepts in a Ruby-like manner. This makes trying to learn them pretty confusing, to the point where the best way to learn them is to first learn Erlang, then after you understand how Erlang works, you can translate in your head from Erlang into these Erlangy Ruby DSLs. Seems bad...

I read a blog post recently about how in the Smalltalk objects that end in "-er" (e.g. Manager, Controller, Builder, Worker, Analyzer) are "bad." For a long time I've thought much the opposite, that there are fundamentally two different types of objects: those which are inanimate and represent objects in the world, and those which are animate and active and manipulate the inanimate objects. I see the programming world as consisting of two very distinct things: actors, and the objects which they act upon. It's not that objects that end in -er are inherently bad, it's simply that OOP languages (including Smalltalk) have never contained the proper primitive for modeling their behavior.

Celluloid is a new kind of concurrency library. It allows you to model concurrent (i.e. thread-backed) actors in a system but still interact with them as if they were plain old Ruby objects. All you need to do to create concurrent objects in Celluloid is to include the Celluloid module in any Ruby class:

That's it! Just by doing "include Celluloid" all instances of this class you create will not be normal Ruby objects, but instead fully concurrent Celluloid-powered objects. These objects will continue to function just like any other object in Ruby. When you call methods on them they return values. If a method raises an exception, the method call aborts and the exception will continue to propagate up the stack.

I want to emphasize this because I think it's incredibly important: users of concurrent objects should not need to be aware that they're concurrent. They should work as closely as possible to any other plain old sequential objects.

Concurrent Objects in Celluloid

I'm going to help you understand how Celluloid works by first starting from a diagram of normal Ruby object interactions and building up from there. Let's start with a diagram of one method calling another:

Hopefully this diagram is easy to understand. Calls come in, responses go out, never a miscommunication! A call consists of the name of a method to invoke and arguments. The response consists of a value to return, or an exception to be raised. Object oriented dogma would have you think of a call as the caller sending a message to the receiver, which in turn sends a response back.

Now let's look at a concurrent object as implemented by Celluloid:

A quick overview: the caller and proxy objects function like normal Ruby caller and receiver objects in the previous diagram. However, the actor circle (i.e. the giant olive-looking thing) represents Celluloid's encapsulation of concurrency. Each actor runs in a separate thread, and exchange messages with proxy objects through their mailboxes, which are thread-safe message queues. Let's dive in!

All communication between regular Ruby objects and Celluloid's actors passes through a proxy object which intercepts the calls. This proxy is responsible for the central illusion that makes Celluloid's concurrent objects as easy to use as any other object, despite them being off in a distant actor somewhere that you must communicate with via messages through thread-safe mailboxes. The proxy object abstracts that all away, and gives you the familiar object-dot-method syntax that you know and love.

The proxy object translates the method call from a normal invocation into a Celluloid::SyncCall object. This object contains the method to be invoked and the parameters. This message is sent from the proxy to the recipient actor by way of its mailbox (which is an object of the Celluloid::Mailbox class).

Mailboxes work like thread-safe message queues, allowing several pending method invocations that the actor can process one-at-a-time. After a method has been invoked on the receiver object, the return value is sent back to the proxy in the form of a Celluloid::Response object. This value is then returned from the original method call made to the proxy.

This gives us the same behavior as a normal method call. However, normal Ruby objects aren't concurrent. To gain concurrency, we need to go beyond the normal method call pattern.

Adding Concurrency: Async Calls and Futures

Celluloid provides two methods for breaking out of the typical request/response call pattern:

Asynchronous calls: send a Celluloid::Call to a recipient actor, but doesn't wait for a response. The method call to the proxy returns immediately. To make an async call, add "!" to the end of any method.

Futures: allow the value a method returns to be obtained later, while that method computes in the background. To make a future call to an actor, use the #future(:method_name, arg1, arg2... &block) syntax. This returns a Celluloid::Future object, whose value can be obtained by calling Celluloid::Future#value.

Let's look at these both to see how they actually work. First an asynchronous call, as we'd do it from Ruby code:

Even though ScarlettJohansson doesn't have an #oops! method, things are continuing to work as if you made two calls to the normal ScarlettJohansson#oops method, but no value is returned. Here's a diagram of what happens when you make an asynchronous call:

Asynchronous calls do not wait for responses, but forward a request for a method to execute, immediately returning back to the caller. No mechanism is provided to obtain a response to this call; async calls are a strictly fire-and-forget type of affair.

What if we want to invoke a method asynchronously but obtain the value the method returns at some point in the future? Enter futures!

Let's imagine we have a small army of Scarlett Johansson clones leaking photos on the Internet. Each one of them takes a little while drinking booze and fumbling around with their phone before they're actually able to post a nude photo. How can we tell them all to start doing this then reaping the results instantaneously when they're done? Let's use Celluloid::Actor#future to accomplish this task:

Our Scarlett Johanssons are now leaking photos to the Internet in parallel, and the wall clock time needed to have all of them get drunk and fumble around with their phone in order to post their nude photos is now barely more than the time each individual Scarlett took to post a couple photos. We're able to get the return value from ScarlettJohansson#oops after asynchronously requesting it be computed. How does this work exactly?

Celluloid implements futures through the Celluloid::Future object. Futures represent the result of a method which may or may not have finished executing before its result is requested. To obtain the result of a given future, call Celluloid::Future#value, but you need to be aware that Celluloid::Future#value will block until the method the future is associated with has finished executing.

Obtaining the value of a pending future is, like always, synchronized through a Celluloid::Mailbox object, which is used to provide a thread-safe interaction point for receiving the result of a method call if it hasn't already completed.

More to come!

I'd originally hoped for this blog post to contain a lot more, but honestly it's been sitting around half-finished for so long I thought I'd break it apart into a series.

In Part II, I'd like to cover what makes Celluloid truly unique (to my knowledge) among similar systems: "reentrant" method dispatch. This post will cover problems in existing RPC systems, including Erlang's gen_server, and discuss why Celluloid provides a more natural object abstraction. It will also explain why and how Celluloid uses Fibers to solve this problem, and the additional features that can be built upon Celluloid's reentrant method dispatch.

Part III (if I get that far) will provide a more in-depth look into some of the Erlangier features of Celluloid such as linking and supervision, and also discuss the roadmap for integrating some additional Erlang features into the Celluloid framework such as OTP-style applications and distributed actor systems. Stay tuned!

Sounds very interesting! Can you do everything you can do with Erlang with Celluloid instead or are there still cases where you need to do conventional message passing? Have you compared the performance of Celluloid vs. a traditional actor model?

Tony, could you provide more information how to perhaps integrate cool.io into celluloid. Revactor had TCP connection/socket classes that were usable. Are you planning on adding those features into celluloid?

Isaac: Erlang was vicariously inspired by Smalltalk through the Actor Model.

Roger: Actors are much heavier in Celluloid than they are in Erlang as Celluloid uses native OS threads on all of the Rubies it presently supports. This makes short-lived actors in particular rather inefficient. The next release of Celluloid will at least include a thread pool which collects dead actors and recycles their existing OS threads when creating new actors.

phil: right now part of the contract of futures is that you MUST get their results. I'd like to fix this.