Igvita.com has recently published the article Rails Performance Needs an Overhaul. Rails performance… no, Ruby performance… no Rails scalability… well something is being criticized here. From my experience, talking about scalability and performance can be a bit confusing because the terms can mean different things to different people and/or in different situations, yet the meanings are used interchangeably all the time. In this post I will take a closer look at Igvita’s article.

Performance vs scalability

Let us first define performance and scalability. I define performance as throughput; number of requests per second. I define scalability as the amount of users a system can concurrently handle. There is a correlation between performance and scalability. Higher performance means each request takes less time, and so is more scalable, right? Sometimes yes, but not necessarily. It is entirely possible for a system to be scalable, yet manages to have a lower throughput than a system that’s not as scalable, or for a system to be uber-fast yet not very scalable. Throughout this blog post I will show several examples that highlight the difference.

“Scalability” is an extremely loaded word and people often confuse it with “being able to handle tons and tons of traffic”. Let’s use a different term that better reflects what Igvita’s actually criticizing: concurrency. Igvita claims that concurrency in Ruby is pathetic while referring to database drivers, Ruby application servers, etc. Some practical examples that demonstrate what he means are as follows.

Limited concurrency at the app server level

Mongrel, Phusion Passenger and Unicorn all use a “traditional” multi-process model in which multiple Ruby processes are spawned, each process handling a single request per second. Thus, concurrency is (assuming that the load balancer has infinite concurrency) limited by the number of Ruby processes: having 5 processes allow you to handle 5 users concurrently.

Threaded servers, where the server spawns multiple threads, each handling 1 connection concurrently, allow more concurrency because because it’s possible to spawn a whole lot more threads than processes. In the context of Ruby, each Ruby process needs to load its own copy of the application code and other resources, so memory increases very quickly as you spawn additional processes. Phusion Passenger with Ruby Enterprise Edition solves this problem somewhat by using copy-on-write optimizations which save memory, so you can spawn a bit more processes, but not significantly (as in 10x) more. In contrast, a multi-threaded app server does not need as much memory because all threads share application code with each other so you can comfortably spawn tens or hundreds of threads. At least, this is the theory. I will later explain why this does not necessarily hold for Ruby.

When it comes to performance however, there’s no difference between processes and threads. If you compare a well-written multi-threaded app server with 5 threads to a well-written multi-process app server with 5 processes, you won’t find either being more performant than the other. Context switch overhead between processes and threads are roughly the same. Each process can use a different CPU core, as can each thread, so there’s no difference in multi-core utilization either. This reflects back on the difference between scalability/concurrency and performance.

Multi-process Rails app servers have a concurrency level that can be counted with a single hand, or if you have very beefy hardware, a concurrency level in the range of a couple of tens, thanks to the fact that Rails needs about 25 MB per process. Multi-threaded Rails app servers can in theory spawn a couple of hundred of threads. After that it’s also game over: an operating system thread needs a couple MB of stack space, so after a couple hundreds of threads you’ll run out of virtual memory address on 32-bit systems even if you don’t actually use that much memory.

There is another class of servers, the evented ones. These servers are actually single-threaded, but they use a reactor style I/O dispatch architecture for handling I/O concurrency. Examples include Node.js, Thin (built on EventMachine) and Tornado. These servers can easily have a concurrency level of a couple of thousand. But due to their single-threaded nature they cannot effectively utilize multiple CPU cores, so you need to run a couple of processes, one per CPU core, to fully utilize your CPU.

The limits of Ruby threads

Ruby 1.8 uses userspace threads, not operating system threads. This means that Ruby 1.8 can only utilize a single CPU core no matter how many Ruby threads you create. This is why one typically needs multiple Ruby processes to fully utilize one’s CPU cores. Ruby 1.9 finally uses operating system threads, but it has a global interpreter lock, which means that each time a Ruby 1.9 thread is running it will prevent other Ruby threads from running, effectively making it the same multicore-wise as 1.8. This is also explained in an earlier Igvita article, Concurrency is a Myth in Ruby.

On the bright side, not all is bad. Ruby 1.8 internally uses non-blocking I/O while Ruby 1.9 unlocks the global interpreter lock while doing I/O. So if one Ruby thread is blocked on I/O, another Ruby thread can continue execution. Likewise, Ruby is smart enough to cause things like sleep() and even waitpid() to preempt to other threads.

On the dark side however, Ruby internally uses the select() system call for multiplexing I/O. select() can only handle 1024 file descriptors on most systems so Ruby cannot handle more than this number of sockets per Ruby process, even if you are somehow able to spawn thousands of Ruby threads. EventMachine works around this problem by bypassing Ruby’s I/O code completely.

Naive native extensions and third party libraries

So just run a couple of multi-threaded Ruby processes, one process per core and multiple threads per process, and all is fine and we should be able to have a concurrency level of up to a couple hundred, right? Well not quite, there are a number of issues hindering this approach:

Some third party libraries and Rails plugins are not thread-safe. Some aren’t even reentrant. For example Rails < 2.2 suffered from this problem. The app itself might not be thread-safe.

Although Ruby is smart enough not to let I/O block all threads, the same cannot be said of all native extensions. The MySQL extension is the most infamous example: when executing queries, other threads cannot run.

Mongrel is actually multi-threaded but in practice everybody uses in multi-process mode (mongrel_cluster) exactly because of these problems. It is also the reason why Phusion Passenger has also gone the multi-process route.

And even though Thin is evented, a typical Ruby web application running on Thin cannot handle thousands of concurrent users. This is because evented servers typically require a special evented programming style, such as the one seen in Node.js and EventMachine. A Ruby web app that is written in an evented style running on Thin can definitely handle a large number of concurrent users.

When is limited application server concurrency actually a problem?

Igvita is clearly disappointed at all all the issues that hinder Ruby web apps from achieving high concurrency. For many web applications I would however argue that limited concurrency is not a problem.

Web applications that are slow, as in CPU-heavy, max out CPU resources pretty quickly so increasing concurrency won’t help you.

Web applications that are fast are typically quick enough at handling the load so that even large number of users won’t notice the limited concurrency of the server.

Having a concurrency of 5 does not mean not mean that the app server can only handle 5 requests per second; it’s not hard to serve hundreds of requests per second with only a couple of single-threaded processes.

The problem becomes most evident for web applications that have to wait a lot for I/O (besides its own HTTP request/response cycle). Examples include:

Apps that have to spend a lot of time waiting on the database.

Apps that perform a lot of external HTTP calls that respond slowly.

Chat apps. These apps typically have thousands of users, most of them doing nothing most of the time, but they all require a connection (unless your app uses polling, but that’s a whole different discussion).

We at Phusion have developed a number of web applications for clients that fall in the second category, the most recent one being a Hyves gadget. Hyves is the most popular social network in the Netherlands and they get thousands of concurrent visitors during the day. The gadget that we’ve developed has to query external HTTP servers very often, and these servers can take 10 seconds to respond in extreme cases. The servers are running Phusion Passenger with maybe a couple tens of processes. If every request to our gadget also causes us to wait 10 seconds for the external HTTP call then we’d soon run out of concurrency.

But even suppose that our app and Phusion Passenger can have a concurrency level of a couple of thousand, all of those visitors will still have to wait 10 seconds for the external HTTP calls, which is obviously unacceptable. This is another example that illustrates the difference between scalability and performance. We had solved this problem by aggressively caching the results of the HTTP calls, minimizing the number of external HTTP calls that are necessary. The result is that even though the application’s concurrency is fairly limited, it can still comfortably serve many concurrent users with a reasonable response time.

This anecdote should explain why I believe that web apps can get very far despite having a limited concurrency level. That said, as Internet usage continues to increase and websites get more and more users, we may at some time come to a point where much a larger concurrency level is required than most of our current Ruby tools allow us to (assuming server capacity doesn’t scale quickly enough).

What was Igvita.com criticizing?

Igvita.com does not appear to be criticizing Ruby or Rails for being slow. It doesn’t even appear to be criticizing the lack of Ruby tools for achieving high concurrency. It appears to be criticizing these things:

Rails and most Ruby web application servers don’t allow high concurrency by default.

Many database drivers and libraries hinder concurrency.

Although alternatives exist that allow concurrency, you have to go out of your way to find them.

There appears to be little motivation in the Ruby community for making the entire stack of web frame work + web app server + database drivers etc scalable by default.

This is in contrast to Node.js where everything is scalable by default.

Do I understand Igvita’s frustration? Absolutely. Do I agree with it? Not entirely. The same thing that makes Node.js so scalable is also what makes it relatively hard to program for. Node.js enforces a callback style of programming and this can eventually make your code look a lot more complicated and harder to read than regular code that uses blocking calls. Furthermore, Node.js is relatively young – of course you won’t find any Node.js libraries that don’t scale! But if people ever use Node.js for things other than high-concurrency servers apps, then non-scalable libraries will at some time pop up. And then you will have to look harder to avoid these libraries. There is no silver bullet.

That said, all would be well if at least the preferred default stack can handle high concurrency by default. This means e.g. fixing the MySQL extension and have the fix published by upstream. The mysqlplus extension fixes this but for some reason their changes aren’t accepted and published by the original author, and so people end up with a multi-thread-killing database driver by default.

Is Node.js innovative? Is Ruby lacking innovation?

A minor gripe that I have with the article is that Igvita calls Node.js innovative while seemingly implying that the Ruby stack isn’t innovating. Evented servers like Node.js actually have been around for years and the evented pattern is well-known long before Ruby or Javascript have become popular. Thin is also evented and predates Node.js by several years. Thin and EventMachine also allow Node.js-style evented programming. The only innovation that Node.js brings, in my opinion, is the fact that it’s Javascript. The other “innovation” is the lack of non-scalable libraries.

Conclusion

Igvita appears to be criticizing something other than Rails performance, as his article’s title would imply.

I don’t think the concurrency levels that the Rails stack provides by default is that bad in practice. But as a fellow programmer, it does intuitively bother me that our laptops, which are a million times more powerful than supercomputers from two decades ago, cannot comfortably handle a couple of thousand concurrent users. We can definitely work towards something better, but in the mean time let’s not forget that the current stack is more than capable of Getting Work Done(tm).

Good read, but I find it odd that this article describes event-based web servers, but doesn’t even mention Nginx, part of your Passenger offering. Isn’t that web server evented? I would like to hear how that fits into the picture, even though Nginx is part of the native infrastructure beneath Ruby and Rails.

Does it need any special tricks to scale well, or concur well, or perform well? Or does it lose all of its evented goodness because it’s shoehorned into serving a web framework not optimised for it?

http://www.phusion.nl/ Hongli Lai

Yes Tor, Nginx is evented. However the Ruby/Rails processes pooled behind it are not evented, they’re single-threaded so the number of processes Phusion Passenger is configured to spawn is also the maximum concurrency level, at least when it comes to Ruby web app requests. At this time the only fully evented setup is Nginx + Thin where your web app must also be explicitly written to be evented (e.g. with async Rack of Cramp or something like that).

Piyush

What about JRuby ? It has system processes as threads and doesn’t have a GIL at al. Wonder why people do not even consider it ?

http://www.phusion.nl/ Hongli Lai

I don’t know what you mean by “system processes as threads”. JRuby, or actually the JVM, uses native OS threads and JRuby does not have a global interpreter lock, so JRuby threads can make use of multiple cores. As far as I know Glassfish is multithreaded so JRuby+Glassfish should be able to handle a pretty high concurrency level.

I think the reason why JRuby is not popular is because of Java. The JVM has a relatively high memory footprint compared to MRI and it starts slowly. I think this is the reason why it is unpopular among many developers, and as a result, not as often used in production as MRI. The switch to JRuby is almost never 100% transparent because of compatibility issues; not all libraries are available for JRuby (e.g. ImageMagick and Nokogiri) and you cannot use some POSIX functions. Add to this my observation that many web apps do not need to be highly concurrent, and I think we’ve arrived at the reason why so many people do not run JRuby in production.

http://steve.dynedge.co.uk Steve Smith

For me I think the point that Igvita was getting at or certainly what I took from it was that Node.js is highly concurrent out the box and its ‘default stack’ enforces a evented architecture, leading to a ‘why doesn’t rails aim for this concurrency’. However like you say that was never the intention for rails and that callback riddled model can be quite confusion. I do think that rails developers in general should take the time to learn a little more about what’s going on under the hood sometimes though.

I’m just happy that you guys are slugging it out and the community at large is learning from the points and can start to think about these things more. Having written evented and threaded servers before in the email domain I see the benefits of both. Hopefully in the future other developers will be able to make the same choices with a little more of the understanding that you guys have.

rubiii

Most popular libraries are running just fine on JRuby. For example: ImageMagick can be used through rmagick4j and of course there is a JRuby-compatible version of Nokogiri.

James Sanders

In my view, the major innovation in Node.js (besides using javascript, which is more clever than innovative) is the event loop as language feature. Writing evented code for any framework is somewhat more difficult than plain imperative code, but one is not required to understand event loops to write against Node.js, only how to use callbacks, which are already common in javascript. It is the transparency that makes it interesting and encourages libraries to remain asynchronous.

http://www.phusion.nl/ Hongli Lai

James Sanders, I don’t know why you claim that the event loop is a language feature in Node.js. Node.js is just plain Javascript with no additional constructs specifically designed for event loops. I take it that you’re referring to the use of callbacks as a way of handling evented code, as opposed to keeping a state machine as evented C code usually looks like. This too isn’t an innovation: Ruby EventMachine has used callbacks since the beginning and it predates Node.js by quite some time. Unless you consider the Javascript part an innovation by itself, of course.

http://www.igvita.com/ Ilya Grigorik

Hongli, thanks for the great rebuttal!

First off, while I picked on Passenger and Unicorn in the article and my RailsConf talk, let me say this: thank you guys for putting in all the hard work into REE and Passenger. Now to address a few of your points:

You’re right, my argument is more broad than just Rails – I picked on Rails because that’s where the majority of the community is today. Likewise, I know that a lot of companies and developers are struggling with the end-to-end Rails stack.

I believe that as a community we need a bit of a reset: we’ve been iterating on a sub-optimal strategy for some time, and I hope we can change that. As much as I love the work you guys are doing with REE, I would much rather see us move on to 1.9.2 – I do realize there is all kinds of technical and political barriers, but hey…

My point about node.js is simple: a great web-app DSL is a feature, great libraries is a feature, testing frameworks are a feature, and increasingly, “high-performance” is a feature (in all definitions of high performance). I believe Rails is on the edge of all but the last point, and the reason why more and more people are starting to play with node is exactly because they know that “performance” sometimes outweighs all the rest.

Now, I’m the last person to advocate to switch to node – exactly for the reasons you mentioned. Any sizable async app is incredibly complex due to all the callbacks. Having said that, I think we can abstract a lot of that with Ruby 1.9 & coroutines. Nothing stops us from having a fully async stuff, while preserving all the great things and code structure we love about Rails & Ruby.

Of course, 1.9.2 alone is hardly the cure. What we need is sane libraries, an app server that combines the ease of passenger + evented (or threaded even) backend that is actually capable of handling >1 concurrent requests, and education of the developers to recognize the pitfalls in what may be hurting their performance. Not an easy task, granted. But, I know that we can do it, because all the components are already there.

http://www.phusion.nl/ Hongli Lai

No worries Ilya, I didn’t consider it picking on. You had valid points in your blog post. In fact your blog post has enticed us to put more priority on increasing Phusion Passenger’s concurrency level support. I’m also looking forward to your async work.

J.A. Roberts Tunney

“Performance” is also a loaded word. I feel it’s much more helpful to measure web-app performance by response time rather than requests per second because that’s how humans perceive performance.

Simon Waddington

Didn’t you mean “Rails needs about 250 MB per process” not 25 MB ???

http://www.phusion.nl/ Hongli Lai

@Simon: No. It starts with about 25-30 MB. After some work maybe it’ll jump to 50 MB. 250 MB doesn’t occur unless:
1. You’re on development mode or
2. your app is badly written or
3. your app is just bloated.

For example we once debugged an app for a client. This app had processes that would use 300+ MB at some point. Turned out they were loading a 25 MB file on every access to a certain URL.

“From my experience, talking about scalability and performance can be a bit confusing because the terms can mean different things to different people”

Well, as far as I can make out there’s a good consensus on the basic meaning of scalability as opposed to performance, if not on the actual units used to measure it. Best not to make things even more confusing by ignoring that consensus.

The gist of it is: scalability doesn’t just describe the performance of a system as a one-off measurement (like concurrent users, as you define it, or req/sec); rather it describes the relationship between performance and available computing power for a given system, in particular how performance changes as available hardware is increased.

An observation like “doubling the amount of hardware doubles the amount of load the system can cope with” would be the essence of a measurement of scalability.

Scalability and performance may be completely orthogonal – a system may scale brilliantly (due to its architecture for handling concurrency, say) but perform terribly (due to choosing an inefficiently-implemented language or runtime).

“Phusion” and “Phusion Passenger” are registered trademarks of Phusion. “Rails”, “Ruby on Rails” and the Rails logo are registered trademarks of David Heinemeier Hansson. All other trademarks are property of their respective owners.