If you haven't heard by now, TorqueBox 2.x is powered by JBoss
AS7 which claims to be blazingly fast and lightweight. So,
naturally, we want to put those claims to the test and see how
TorqueBox 2.x stacks up against the competition.

Building on what we've learned from previous benchmarks (round
1, round 2), this latest round of benchmarking
compares the performance of Spree running under:

Even if you're not a fan of JRuby, stick around to see how Ruby 1.9.2
compares to REE. From round 2 we know REE outperforms Ruby
1.8.7 but how does it compare to 1.9.2?

Why Spree?

Spree is a well-known Rails 3 application that can run under Ruby 1.8,
1.9, and JRuby. Based on feedback from our Redmine benchmarks, we
wanted to make sure the next application could run under Ruby 1.9 for
an accurate comparison of JRuby vs C Ruby performance.

The Setup

Spree is nice enough to ship with a set of sample data that we used
for benchmarking. The benchmark script simulates users browsing around
a few pages of the site, starting with a small number of concurrent
users and gradually increasing until it finishes after 80 minutes.

More details about the benchmark and links to the raw results are at
the bottom of the post.

The Results

Top Servers

Ignoring the latency graph for a minute, it's obvious that the runtime
(JRuby vs Ruby 1.9.2) is the differentiator in throughput, CPU usage,
and free memory. TorqueBox and Trinidad have no appreciable difference
in these categories but both clearly outperform Passenger and
Unicorn. If you're concerned with maximizing throughput, minimizing
CPU usage, or minimizing memory usage under load then you can't go
wrong choosing either JRuby server.

However, what about the latency graph? This is a graph of the average
time taken for each request - in other words the average time a user
would have to wait for a page on the site to load. This is where the
difference in web servers, not runtimes, is readily apparent.

At peak load, TorqueBox has a lower latency than the nearest
competitor, Passenger, by a factor of 8 and beats out Trinidad by a
factor of 32. Note that the latency graph's y-axis has a logarithmic
scale. To help illustrate this point, here's the same latency graph
with a linear y-axis and Unicorn removed because its latency is so bad
at the end of the test.

So, in a common real-world scenario, let's assume our application has
a requirement that it must have an average response time of 1
second. How many requests per second can each server handle while
staying under this 1 second mark? Looking at the latency and
throughput graphs, we see that Trinidad can handle 45 requests per
second, Passenger 90 requests per second, Unicorn 100 requests per
second, and TorqueBox 130 requests per second. At peak load of 130
requests per seconds the average response time from TorqueBox is only
256ms, well under our 1 second requirement.

If you were still skeptical about the performance benefits of
switching to JRuby, the above graphs should be convincing enough to
give it a shot.

TorqueBox 2.x vs TorqueBox 1.1.1

We've seen how TorqueBox 2.x stacks up against the competition, but
how does it compare to the latest 1.x stable release, TorqueBox 1.1.1?
Thanks in a large part to AS7, TorqueBox 2.x has lower latency, higher
peak throughput, less CPU usage, and less memory usage than TorqueBox
1.1.1.

REE vs Ruby 1.9.2

Ruby 1.9.2 gives Passenger and Unicorn lower latency, higher
throughput, lower CPU usage, and more free memory than REE. From a
performance standpoint there's no reason why you shouldn't be using
1.9.2 if you must use a C Ruby.

The Details

All benchmarks were run on Amazon EC2 using an m1.large Tsung
client instance, a c1.xlarge server instance, and a db.m1.large MySQL
database instance. All instances were started in the same availabilty
zone and every benchmark started from a clean database loaded with
Spree's sample data. Each benchmark run was performed twice on
separate days and the best of the two runs used for the graphs.

TorqueBox and Trinidad were set to use a 2GB heap and a maximum of 100
HTTP threads to match the database connection pool size. Unicorn and
Passenger were both started with 50 workers. From testing, 50 was the
sweet spot to get maximum throughput and anything over just increased
CPU usage and memory usage without any further increase in throughput.