It has already been two years since we’ve first released Phusion Passenger. Time sure flies and we’ve come a long way since then. We were the first to implement a working Ruby web app deployment solution that integrates seamlessly in the web server, and all the features that we’ve developed over time – smart spawning and memory reduction, upload buffering, Nginx support, etc – have served us for a long time. Nevertheless, it is time to say goodbye to the old Phusion Passenger 2.2 codebase. In the past we had focused primarily on three things:

Ease of use.

Stability.

Robustness.

Notice that “performance” is not on the above list. We strived to make Phusion Passenger “fast enough”, e.g. not ridiculously slower than the alternatives. Lately it would appear that competitors are once again focusing on performance. We can of course not afford to stay behind. We’ve been working on Phusion Passenger 3 for a while now. Today we will begin unveiling the technology behind this new major Phusion Passenger version. This blog post is the first of the multiple technology previews to come.

The performance test

It’s not very useful to benchmark Phusion Passenger performance using a Rails application because most of the time is spent in Rails and the application itself. Therefore we’ll benchmark with a simple Rack application. Consider the following hello world Rack application:

A graph is worth more than a thousand words

Suffice it to say, even though Phusion Passenger was already pretty fast, we believe we’ve created some pretty significant improvements in terms of performance and it will be interesting to see how the final version of Phusion Passenger 3 will stack up against the competition. Needless to say, we’ve performed our own benchmarks already and have concluded that “the self-proclaimed fastest deployment solution” really isn’t the fastest deployment solution compared to Phusion Passenger 3. 😉 That said, benchmarks are lies, lies, lies, damn lies of course and your mileage may definitely vary so we will encourage you to perform any kind of benchmark you’d like when we release 3. For us, the most important issue still lies in the trade off of how much time you have to spend actually maintaining your setup, but as the graphs indicate, we’ve made some pretty monstrous improvements to performance as well.

How did we do it?

When it comes to optimizing software, there’s the saying that 20% of the code is responsible for 80% of the time. Not so with Phusion Passenger: we’ve found that there were no obvious performance bottlenecks. Even profilers turned out to be totally useless because all the times are so small and so close to each other. Phusion Passenger was already pretty fast.

Instead, we optimized the hard way: with lots and lots of micro-optimizations. 2% here, 3% there, etc etc. In other words, blood, sweat, tears and lots of sleepless nights. The optimizations can be summed up as follows:

Reducing system calls

System calls are pretty expensive compared to userspace computation. They require a context switch to the kernel. For example, all I/O operations (read(), write()) are system calls. We’ve performed an extensive code inspection and removed and coalesced a lot of redundant system calls.

The beginning of a zero-copy I/O architecture

The CPU is very fast nowadays. In fact it is so fast that RAM speed cannot keep up with the CPU. This makes memory access very expensive. In case of I/O intensive applications such as web servers, one would benefit from copying I/O data as little as possible. In order to optimize memory access, we’ve implemented the beginning of a zero-copy I/O architecture. This architecture covers both the C++ and the Ruby parts of Phusion Passenger.

Less Ruby garbage production

The garbage collector in Ruby can be a significant bottleneck. We’ve heavily optimized the Ruby part of the request handler and reduced creation of Ruby objects to a minimum. This made the request handler significantly faster in our tests.

Optimizing algorithms and optimizing Ruby code in C

Some algorithms have been optimized, e.g. some O(N) algorithms have been replaced by O(log N) or O(1) algorithms. Some key Ruby code has been replaced by C code. The former didn’t give us a lot of performance because all the O(N) algorithms weren’t doing a lot of work in the first place, but the latter gave us a much more noticeable boost.

Reducing context switches

Phusion Passenger is heavily multithreaded and consists of multiple threads and service processes. However, some communication between threads and processes required round trips, which caused more context switches than necessary. We’ve optimized our internal protocols and reduced context switching to a minimum.

The future

As stated in this blog post, this is just a glimpse of what we’ve got in store for you and as you’ve come to expect from us, we want to make sure that our findings will hold up in real life scenarios as well. With close to two years of field testing with Phusion Passenger 2, witnessing some of the most high demanding environments in web hosting of our clients, we’ve been working for the last few months now on forging this experience back into Phusion Passenger 3. Through beta testing in these high-demand Rails environments, we hope to ensure that they will give you the best experience both in an enterprise environment as well as for your personal use. Performance has been touched upon in this blog post, and in the coming period leading up to the release of Phusion Passenger 3, we’ll start to unveil bit by bit what we’ve been tinkering on for the last few months. In particular, we’re looking forward how the zero-copy I/O architecture will unfold in a real life scenario as well as the optimizations we’ve performed over the last months. Even though we’re not done yet in terms of optimizing, we will likely hit a ceiling at some point where optimizations will get harder and harder and this in particular is true if you want to retain features such as ease of use that define Phusion Passenger. One thing is for sure, we want this release to be nothing less than stunning so we encourage you to submit your wish list to us as well. We’ve likely implemented a lot of them already, but we just want to make sure that we’re not missing anything.

Amazing! This is just too much joy to handle haha. With Ruby on Rails 3 giving a refreshing development environment, focussed on making things simpler, modular and more memory efficient/performant.. Now you guys come with Phusion Passenger 3 to improve performance to THAT extent, yet keeping it simple and maintainable.

All this really makes me so glad that I joined this community rather than stick with the previous ones. This is pure satisfaction, and apparently, it never stops!

Thanks so much for putting all your effort into this, can’t wait to read any updates!

You guys are relentless, which is great for the whole community. I hope more people pay attention to you guys because Passenger + REE already rocked, and now this new Passenger 3 is even more amazing. Thank you all for the hard work.

Steve: you may want to check your load balancer or routers on the way. We haven’t found anything in Phusion Passenger so far that could cause the EPIPE thing; on the other hand the cases that we were able to solve have all been related to broken load balancers/routers/proxies and that kind of stuff.

@Hongli – still suffering the problem that’s in 378 (which was merged into 435) – see the output from Jan 27 http://code.google.com/p/phusion-passenger/issues/detail?id=378. My setup is Ubuntu + latest Nginx/Passenger + Ruby 1.9 + Rails 2.3. No interesting load balancers, proxies etc. In short, once that second spawner appears, my app locks up. Happens about once a day. Since I saw a comment that says the spawner has been rewritten in Passenger 3, I’m hoping it will just go away.

@hongli – Phusion Passenger 2 is already pretty feature complete, and PP3 sounds great. Any further optimizations in REE, and Rails 3’s huge leaps in performance, will make PP3 an even better solution for Rails app deployment that it already is. Keep up the good work.

One thing that would be nice to see in PP3 though, is a “PassengerMaxInstances” settings for individual vhosts, which controls the amount of instances each site can spawn up. PP2 only has one for global, which means I might have two sites, one only needing two workers, one needing 10, but I can’t tell Passenger that.

seydar

Nice article. Would you be able to publish more statistics next time, such as the standard deviation, for instance?

“Phusion” and “Phusion Passenger” are registered trademarks of Phusion. “Rails”, “Ruby on Rails” and the Rails logo are registered trademarks of David Heinemeier Hansson. All other trademarks are property of their respective owners.