Hi, I'm Tony Arcieri. You may remember me from such software projects as Celluloid, Reia, and Cool.io...

Tuesday, August 10, 2010

Multithreaded Rails is generally better than Async Rails, and Rainbows is cooler than Node

Once upon a time Rails was single threaded and could only process one request at a time. This meant for each concurrent request you wanted to process with Rails, you needed to run an entirely separate Ruby VM instance. This was not a good state of affairs, especially in cases where your Rails application was blocking on I/O when talking to a database or other external service. An entire instance of your application sat there useless as it was waiting for I/O to complete.

The lack of multithreading in Rails would lead Ezra Zygmuntowicz to write Merb, a thread-safe web framework for Ruby which certainly borrowed conceptually from Rails and would go on to serve as the core for the upcoming Rails 3.0 release. In the meantime, the earlier Rails 2.x branch would get support for a thread safe mode as well. This meant that web applications written in Ruby could process multiple requests using a single VM instance: while one thread was blocking on a response from a database or other service, the web application could continue processing other requests in other threads.

Even better, while Ruby originally started out with a "green threads" implementation which executed threads in userspace and could not provide multicore concurrency, newer, more modern Ruby implementations emerged which provided true native multithreading. JRuby and IronRuby, implementations of Ruby on the JVM and .NET CLR respectively, provided truly concurrent native multithreading while still maintaining Ruby's original threading API. Rubinius, a clean-room implementation of a Ruby VM based on the Smalltalk 80 architecture, has started to take steps to remove its global lock and allow concurrent multithreading as well.

With a multithreaded web framework like Merb, recent versions of Rails 2.x, or Rails 3.0, in conjunction with a Ruby VM that supports concurrent multithreading, you now need to only run one VM instance with a copy of your web application and it can utilize all available CPU cores in a server, providing true concurrent computation of Ruby code. No longer do you need a "pack of Mongrels" to serve your Rails application. Instead, you can just run a single VM and it will utilize all available system resources. This has enormous benefits in terms of ease-of-deployment, monitoring, and memory usage.

Ruby on Rails has finally grown up and works just like web applications in other more popular languages. You can run just one copy of any Ruby VM that supports native multithreading and utilize all available server resources. Rails deployment is no longer a hack. It Just Works.

But Wait, Threads Are Bad, And Async Is The New Hotness!

Threads have typically had a rather mired reputation in the programming world. Threads utilize shared state by default and don't exactly provide the greatest mechanisms for synchronizing bits of shared state. They're a leaky abstraction, and without eternal vigilance on the part of an entire development team and an excellent understanding of what's happening when you use thread synchronization mechanisms, sharing state between threads is error-prone and often difficult to debug.

The "threads are bad" cargo cult has often lead people to pursue "interesting" solutions to various concurrency problems in order to avoid using threads. Event-based concurrent I/O became an incredibly popular solution for writing network servers, an approach seen in libraries like libevent, libev, Python's Twisted, and in the Ruby world EventMachine and my own event library, Rev. This scheme uses a callback-driven approach, often with a central reactor core, dispatching incoming I/O asynchronously to various handlers. For strictly I/O-bound applications, things like static file web servers, proxies, and protocol transformers, this approach is pretty much the best game in town.

Node.js, a pretty awesome I/O layer for Google's V8 JavaScript interpreter, is something of the new hotness. It's certainly opened up the evented I/O approach to a new audience, and for I/O-bound tasks it provides a way to script in a dynamic language while remaining quite fast. But as others have noted, Node is a bit overhyped. If you write your server in Node, will it scale? It really depends on the exact nature of the problem. I'll get into that in a bit.

Ilya Grigorik recently presented at RailsConf and OSCON about em-synchrony, a set of "drivers" for EventMachine which facilitate various types of network I/O which present a synchronous interface but use Fibers to perform I/O asynchronously in the background. He had some rather impressive things to share there, including Rails running on top of EventMachine, dispatching requests concurrently using fibers instead of threads. This approach won't provide you the computational concurrency that truly multithreaded Rails as in JRuby and IronRuby (and Rubinius soon!), but it will provide you wicked fast I/O performance... at a price.

The New Contract

Programmers generally live in a synchronous world. We call functions which return values. That's the status quo. Some languages go so far as to make this the only possible option. Evented frameworks do not work like this. Evented frameworks turn the world upside down. For example, in Ruby, where you might ordinarily write something like:

response = connection.request params

In async land, you first have to initiate the request:

begin_request params

Then define a callback in order to receive the response:

defon_response(response) ...end

Rather than calling functions, you initiate side effects which will eventually call one of a number of callbacks. Exceptions no longer work. The context is lost between callbacks; you always start from just your arguments and have to figure out exactly what you were up to before, which generally necessitates breaking anything complex down into a finite state machine, instead of say, an imperative list of I/O commands to perform. It's a very different approach from the status quo.

The em-synchrony approach promises to save you from this by wrapping up all that ugly callback driven stuff with Fibers. I've been down that road and I no longer recommend it. In January 2008 I wrote Revactor, a Erlang-like implementation of the Actor Model for Ruby 1.9, using Fibers as the underlying concurrency primitive. It's the first case known to me of someone using this approach, and significantly more powerful than any of the other available frameworks. Where em-synchrony makes you write Fiber-specific code for each network driver, Revactor exposed an incomplete duck type of Ruby's own TCPSocket, which means that patching drivers becomes significantly easier as you don't need asynchronous drivers to begin with.

However, for the most part I stopped maintaining Revactor, largely because I began to think the entire approach is flawed. The problem is frameworks like Revactor and em-synchrony impose a new contract on you: evented I/O only! You aren't allowed to use anything that does any kind of blocking I/O in your system anywhere, or you will hang the entire event loop. This approach works great for something like Node.js, where the entire system was written from the ground-up to be asynchronous, in a language which has a heritage of being asynchronous to begin with.

Not so in Ruby. There are tons and tons of libraries that do synchronous I/O. If you choose to use async Rails, you can't use any library which hasn't specifically been patched with em-synchrony-like async-to-Fiber thunks. Since most libraries haven't been patched with this code, you're cutting yourself off from the overwhelming majority of I/O libraries available. This problem is compounded by the fact that the only type of applications which will benefit from the async approach more than the multithreaded approach are ones that do a lot of I/O.

This is a problem you have to be eternally vigilant about what libraries you use and make absolutely sure nothing ever blocks ever. Hmm, is this beginning to sound like it may actually be as problematic as threads? And one more thing: exceptions. Dealing with exceptions in an asynchronous environment is very difficult, since control is inverted and exceptions don't work in callback mode. Instead, for exceptions to work properly, all of the "Fibered" em-synchrony-like drivers must catch, pass along, and rethrow exceptions. This is left as an exercise to the driver writer.

Threads are Good

Threads are bad when they have to share data. But when you have a web server handling multiple requests concurrently with threads, they really don't need to share any data at all. When threads don't share any data, multithreading is completely transparent to the end user. There are a few gotchas in multithreaded Rails, such as some foibles with the initial code loading, but after you get multithreaded Rails going, you won't even notice the difference from using a single thread. So what cases would Async Rails be better than multithreaded Rails for? I/O bound cases. For many people the idea of an I/O bound application draws up the canonical Rails use case: a database-bound app.

"Most Rails apps are database bound!" says the Rails cargo cult, but in my experience, useful webapps do things. That said, Async Rails will have its main benefits over multithreaded apps in scenarios where the application is primarily I/O bound, and a webapp which is little more than a proxy between a user and the database (your typical CRUD app) seems like an ideal use case.

What does the typical breakdown of time spent in various parts of your Rails app look like? The conventional wisdom would say this:

But even this is deceiving, because models generally do things in addition to querying your database. So really, we need a breakdown of database access time. Evented applications benefit from being bound on I/O with little computation, so for an Async Rails app this is the ideal use case:

Here our application does negligible computation in the models, views, and controllers, and instead spends all its time making database queries. This time can involve writing out requests, waiting while the database does its business, and consuming the response.

This picture is still a bit vague. What exactly is going on during all that time spent doing database stuff? Let's examine my own personal picture of a typical "read" case:

For non-trivial read cases, your app is probably spending a little bit of time doing I/O to make the REQuest, most of its time waiting for the database QueRY to do its magic, and then spending some time reading out the response.

But a key point here: your app is spending quite a bit of time doing nothing but waiting between the request and the response. Async Rails doesn't benefit you here. It removes some of the overhead for using threads to manage an idle connection, but most kernels are pretty good about managing a lot of sleepy threads which are waiting to be awoken nowadays.

So even in this case, things aren't going to be much better over multithreaded apps, because your Rails app isn't actually spending a lot of time doing I/O, it's spending most of it's time waiting for the database to respond. However, let's examine a more typical use case of Rails:

Here our app is actually doing stuff! It's actually spending a significant amount of time computing, with some time spent doing I/O and a decent chunk spent just blocking until an external service responds. For this case, the multithreaded model benefits you best: all your existing Ruby tools will Just Work (provided they don't share state unsafely), and best of all, when running multithreaded on JRuby or IronRuby (or Rubinius soon!) you can run a single VM image, reduce RAM usage by sharing code between threads, and leverage the entire hardware stack in the way the CPU manufactures intended.

Why You Should Use JRuby

JRuby provides native multithreading along with one of the most compatible alternative Ruby implementations out there, lets you leverage the power of the JVM, which includes a great ecosystem of tools like VisualVM, a mature underlying implementation, some of the best performance available in the Ruby world, a diverse selection of garbage collectors, a significantly more mature ecosystem of available libraries (provided you want to wrap them via the pretty nifty Java Interface), and the potential to deploy your application without any native dependencies whatsoever. JRuby can also precompile all of your Ruby code into an obfuscated Java-like form, allowing you to ship enterprise versions to customers you're worried might steal your source code. Best of all, when using JRuby you also get to use the incredibly badass database drivers available for JDBC, and get things like master/slave splits and failover handled completely transparently by JDBC. Truly concurrent request handling and awesome database drivers: on JRuby, it Just Works.

Why not use IronRuby? IronRuby also gives you native multithreading, but while JRuby has 3 full time developers working on it, IronRuby only has one. I don't want to say that IronRuby is dying, but in my opinion JRuby is a much better bet. Also, the JVM probably does a better job supporting the platforms of interest for running Rails applications, namely Linux.

Is Async Rails Useful? Kinda.

All that said, are there use cases Async Rails is good for? Sure! If your app is truly I/O bound, doing things like request proxying or a relatively minor amount of computation as compared to I/O (regex scraping comes to mind), Async Rails is awesome. So long as you don't "starve" the event loop doing too much computation, it could work out for you.

I'd really be curious about what kinds of Rails apps people are writing that are extremely I/O heavy though. To me, I/O bound use cases are the sorts of things people look at using Node for. In those cases, I would definitely recommend you check out Rainbows instead of Async Rails or Node. More on that later...

Why I Don't Like EventMachine, And Why You Should Use Rev (and Revactor) Instead

em-synchrony is built on EventMachine. EventMachine is a project I've been using and have contributed to since 2006. I really can't say I'm a fan. Rather than using Ruby's native I/O primitives, EventMachine reinvents everything. The reason for this is because its original author, Francis "Garbagecat" Cianfrocca, had his own libev(ent)-like library, called "EventMachine", which was written in C++. It did all of its own I/O internally, and rather than trying to map that onto Ruby I/O primitives, Francis just slapped a very poorly written Ruby API onto it, foregoing any compatibility with how Ruby does I/O. There's been a lot of work and refactoring since, but even so, it's not exactly the greatest codebase to work with.

While this may have been remedied since last I used EventMachine, a key part of the evented I/O contract is missing: a "write completion" callback indicating that EventMachine has emptied the write buffer for a particular connection. This has lead to many bugs in cases like when proxying from a fast writer to a slow reader, the entire message to be proxied is taken into memory. There are all sorts of special workarounds for common use cases, but that doesn't excuse this feature being missing from EventMachine's I/O model.

It's for these reasons that I wrote Rev, a Node-like evented I/O binding built on libev. Rev uses all of Ruby's own native I/O primitives, including Ruby's OpenSSL library. Rev sought to minimize the amount of native code in the implementation, with as much written in Ruby as possible. For this reason Rev is slower than EventMachine, however the only limiting factor is developer motivation to benchmark and rewrite the most important parts of Rev in C instead of Ruby. Rev was written from the ground up to perform well on Ruby 1.9, then subsequently backported to Ruby 1.8.

Rev implements a complete I/O contract including a write completion event which is used by Revactor's Revactor::TCP::Socket class to expose an incomplete duck type of Ruby's TCPSocket. This should make monkeypatching existing libraries to use Revactor-style concurrency much easier. Rather than doing all the em-synchrony-style Fiber thunking and exception shuffling yourself, it's solved once by Revactor::TCP::Socket, and you just pretend you're doing normal synchronous I/O.

Revactor comes with all sorts of goodies that people seem to ask for often. Its original application was for a web spider, which in early 2008 was sucking down and scanning regexes on over 30Mbps of data using four processes running on a quad core Xeon 2GHz. I'm sure it was, at the time, the fastest concurrent HTTP fetcher ever written in Ruby. Perhaps a bit poorly documented, this HTTP fetcher is part of the Revactor standard library, and exposes an easy-to-use synchronous API which scatters HTTP requests to a pool of actors and gathers them back to the caller, exposing simple callback-driven response handling. I hear people talking about how awesome that sort of thing is in Node, and I say to them: why not do it in Ruby?

Why Rainbows Is Cooler Than Node

Both Rev and Revactor-style concurrency are provided by Eric Wong's excellent Rainbows HTTP server. Rainbows lets you build apps which handle the same types of use cases as Node, except rather than having to write everything in upside async down world in JavaScript, using Revactor you can write normal synchronous Ruby code and have everything be evented underneath. Existing synchronous libraries for Ruby can be patched instead of rewritten or monkeypatched with gobs of Fiber/exception thunking methods.

Why write in asynchronous upside down world when you can write things synchronously? Why write in JavaScript when you can write in Ruby? Props to everyone who has worked on solutions to this problem, and to Ilya for taking it to the mainstream, but in general, I think Rev and Revactor provide a better model for this sort of problem.

Why I Stopped Development on Rev and Revactor: Reia

A little over two years ago I practically stopped development on Rev and Revactor. Ever since discovering Erlang I thought of it as a language with great semantics but a very ugly face. I started making little text files prototyping a language with Ruby-like syntax that could be translated into Erlang. At the time I had outgrown my roots as an I/O obsessed programmer and got very interested in programming languages, how they work, and had a deep desire to make my own.

Erlang's model provides the best compromise for writing apps which do a lot of I/O but also do a lot of computation as well. Erlang has an "evented" I/O server which talks to a worker pool, using a novel interpretation of the Actor model. Where the original Actor model was based on continuations and continuation passing, making it vulnerable to the same "stop the world" scenarios if anything blocks anywhere, Erlang chose to make its actors preemptive, more like threads but much faster because they run in userspace and don't need to make a lot of system calls.

Reia pursues Erlang's policy of immutable state systemwide. You cannot mutate state, period. This makes sharing state a lot easier, since you can share a piece of state knowing no other process can corrupt it. Erlang uses a model very similar to Unix: shared-nothing processes which communicate by sending "messages" (or in the case of Unix, primitive text streams). For more information on how Erlang is the evolution of the Unix model, check out my other blog post How To Properly Utilize Modern Computers, which spells out a lot of the same concepts I've discussed in this post more abstractly. Proper utilization of modern computers is exactly what Reia seeks to do well.

Reia has been my labor of love for over two years. I'm sorry if Rev and Revactor have gone neglected, but it seems I may have just simply been ahead of my time with them, and only now is Ruby community interest in asynchronous programming piqued by things like Node and em-synchrony. I invite you to check out Rev, Revactor, and Reia, as well as fork them on Github and start contributing if you have any interest in doing advanced asynchronous programming on Ruby 1.9.

24 comments:

If your goal is to convince people, weak statements like 'most kernels are pretty good about managing a lot of sleepy threads' aren't going to help, especially if they're not backed by any evidence. Maybe you should run some benchmarks so instead of saying 'most kernels' you can say which kernels, and instead of 'pretty good' you can say exactly how good.

Numbers certainly speak louder than words, but in a post as long and rambly as this I really wasn't prepared to dig deep into modern thread performance.

While I realize it's an appeal to authority, Varnish architect Poul-Henning Kamp has written one of the fastest HTTP caches available using a thread-per-connection model, and in the Varnish performance-tuning documentation claims "Having a number of idle threads is reasonably harmless."

As for actual benchmarks, what's the use case? That's the point I've been trying to drive home in this entire post. Sure, I could make a synthetic benchmark comparing a threaded/blocking idle to an evented idle for large numbers of connections, but is that really useful?

Performance benchmarks on a real-world Rails app would be interesting, but not really applicable to other people as everyone's app will vary.

As you have correctly guessed, I prefer Ruby to JavaScript. However, one of the weird things about Node is that it has managed to attract a lot of people who would rather be programming Ruby to JavaScript. If you don't know/like Ruby but do like JavaScript, I can't criticize Node in that regard.

But still, there's the "upside-down world" issue. Some people really like programming that way. For the majority of programmers, it's pretty foreign and hard to reason about. Having to decompose an otherwise procedural description of a protocol into FSMs to implement complex protocols still sucks. It makes them more complex and error prone.

Ry has talked about adding coroutines to Node and presently seems to be opposed. Until then, in the Ruby world with Fibers you can get synchronous I/O on top of an underlying async stack. In Node you can't. The debate transcends mere Ruby vs JavaScript.

Have you checked out Zed Shaw's Mongrel2? It's a HTTP server based on Zed's original HTTP Parser and ZeroMQ, a really promising evented networking library. It is completely language agnostic, so it's like Node.js or EventMachine, only you can use it from any language. There are already bindings out for lots of languages, and bindings are quite easy to write. So it's a modern web server into which you can plug your favorite web app platform in your favorite language. It also has some very innovative features like supporting WebSockets, N handlers responding to M requests, plus a few others. Worth a look.

While you are mostly right, there is a significant flaw with assuming threads in Ruby solve anything: most machines can't run more than 2-3000 threads at any time. The scheduling cost kills it.

You'd need green threads multiplexed over native threads + thread pool for blocking I/O. But that just tries to fix the fact that I/O is generally sync by default, when it should be async by default with sync optionally implemented on top.

And Erlang gives you just that :) So yeah, Reia, LFE and Efene are cool.

I have ruby on rails app with me,but it does not support asynchronous mechanism.So I wanna integrate my app with JRuby to get the asynchronous feature.Would it be possible.If so how do I proceed. Any help would be greatly appreciated,thank you very much..