Phusion Passenger version 5.0.0 beta 3 was released today, with a number of changes to the turbocache. The turbocache is a component in Passenger 5 that automatically caches HTTP responses in an effort to speed up the application. The turbocache is not a feature-rich HTTP cache like Varnish, but instead it’s more like a “CPU L1 cache for the web” — small and fast, requires little configuration, but has fewer features by design.

The turbocache was implemented with HTTP caching standards in mind, namely RFC 7234 (HTTP 1.1 Caching) and RFC 2109 (HTTP State Management Mechanism). Our initial mindset in implementing the turbocache was like that of compiler implementors: if something is allowed by the standards, then we’ll implement it in order pursue maximum possible performance.

But it turns out that following the standards strictly may raise security concerns. A while ago, we were contacted by Chris Heald, who provided convincing cases on why the turbocache’s behavior is problematic from a security standpoint. Chris is a veteran Varnish user and has a lot of experience on the subject of HTTP caching.

The gist is that the turbocache made it too easy to accidentally cache responses which should not be cached. Imagine that a certain response contains sensitive information and is only meant to be served to one user. Such sensitive information may include security credentials or session cookies. If the application sets HTTP caching headers incorrectly, then the turbocache may serve that response to other users, resulting in an information leak.

Such problems are technically bugs in the application. After all, Passenger is just following the standards. But Chris asserted that such mistakes are very easily made. Indeed, he has seen quite a number of applications that send out incorrect HTTP caching headers. For this reason, he believes that the turbocache should be more conservative by default.

Cookies also deserve special attention. RFC 2109 mentions that cookies may only be cached when they are intended to be shared by multiple users. But it is impossible for Passenger to know the intention of the cookies without configuration from the application developer. At the same time, providing such configuration goes against the spirit of the turbocache, namely that it should be easy and automatic.

It was clear that something had to be done.

Turbocache changes

More conservative

After contemplating the issues, we’ve decided to make the turbocache more conservative in order to avoid the most common security issues.

In beta 1 and beta 2, the turbocache caches all “default cacheable responses”, as defined by RFC 7234. These are responses to GET requests; with status code 200, 203, 204, 300, 301, 400, 405, 410, 414 or 501; and for which no headers are set that prevent caching, e.g. “Cache-Control: no-store”. This means that a GET request which yields a 200 response with no caching headers, is actually cacheable per the standards, and so it was cached by the turbocache.

In beta 3, responses are no longer cached unless the application explicitly sends caching headers. That means that a GET request which yields a 200 response is only cached if the application sends an “Expires” or a “Cache-Control” header (which must not contain “private”, “no-store”, etc).

While it is still possible for some application responses to be unintentionally cached, this change solves the majority of the issues. The only way to avoid these issues out of the box is by disabling caching completely, which has the downside of not being able to leverage said caching features. Instead, we believe it is reasonable that we only cache requests in case the application asks for this explicitly. A potential caveat is that even though the application and Passenger respect the specifications, an application developer may not have a full understanding of how caching behaves according to said specs. A developer should choose to disable caching entirely in such cases.

Bugs fixed

While investigating Chris’s case, we also uncovered some bugs in the turbocache, such as the incorrect handling of certain headers. These bugs have been fixed. The full details can be found in the Passenger 5 beta 3 release notes.

Using the turbocache and speeding up applications

Now that the turbocache has changed in behavior, here are some practical tips on what you can do to make good use of the turbocache.

Learn about HTTP caching headers

The first thing you should do is to learn how to use HTTP caching headers. It’s pretty simple and straightforward. Since the turbocache is just a normal HTTP shared cache, it respects all the HTTP caching rules.

Set an Expires or Cache-Control header

To activate the turbocache, the response must contain either an “Expires” header or a “Cache-Control” header.

The “Expires” header tells the turbocache how long to cache a response. Its value is an HTTP timestamp, e.g. “Thu, 01 Dec 1994 16:00:00 GMT”.

The Cache-Control header is a more advanced header that not only allows you to set the caching time, but also how the cache should behave. The easiest way to use it is to set the max-age flag, which has the same effect as setting “Expires”. For example, this tells the turbocache that the response is cacheable for at most 60 seconds:

Cache-Control: max-age=60

As you can see, a “Cache-Control” header is much easier to generate than an “Expires” header. Furthermore, “Expires” doesn’t work if the visitor’s computer’s clock is wrongly configured, while “Cache-Control” does. This is why we recommend using “Cache-Control”.

Another flag to be aware of is the private flag. This flag tells any shared caches — caches which are meant to store responses for many users — not to cache the response. The turbocache is a shared cache. However, the browser’s cache is not, so the browser can still cache the response. You should set the “private” flag on responses which are meant for a single user, as you will learn later in this article.

And finally, there is the no-store flag, which tells all caches — even the browser’s — not to cache the response.

Here is an example of a response which is cacheable for 60 seconds by the browser’s cache, but not by the turbocache:

Cache-Control: max-age=60,private

The HTTP specification specifies a bunch of other flags, but they’re not relevant for the turbocache.

Only GET requests are cacheable

The turbocache currently only caches GET requests. POST, PUT, DELETE and other requests are never cached. If you want your response to be cacheable by the turbocache, be sure to use GET requests, but also be sure that your request is idempotent.

Avoid using the “Vary” header

The “Vary” header is used to tell caches that the response depends on one or more request headers. But the turbocache does not implement support for the “Vary” header, so if you output a “Vary” header then the turbocache will not cache your response at all. Avoid using the “Vary” header where possible.

Common application caching bugs

Even though the turbocache has become more conservative now, it is still possible for application responses to be unintentionally cached if the application outputs incorrect caching headers. Here are a few tips on preventing common caching bugs.

Varying response by Ajax

A common pattern is to return a different response depending on whether or not it was an Ajax call. For example, consider this Rails app, which returns a JSON response for Ajax calls, HTML response otherwise:

This can cause the turbocache to return the JSON response for non-Ajax calls, or to return the HTML response for Ajax calls.

There are two ways to solve this problem:

Set the “Vary: X-Requested-With” header so that the cache knows the response depends on this header. Rails’s request.xhr? method checks this header. Note that the turbocache currently disables caching altogether upon encountering a “Vary” header.

Do not vary the response based on whether or not it is an Ajax call. Instead, vary the response based on the URI. For example, you can return JSON responses only if the URI ends with “.json”.

Be careful with caching when setting cookies

If your application outputs cookies then you should be careful with your caching headers. You should only allow the turbocache to cache the response if all cookies may be cacheable by multiple users. If any of your cookies contain user-specific information, or if you’re not sure, then you should set the “private” flag in “Cache-Control” to prevent the turbocache from caching it.

Set “Cache-Control: private” when working with sessions

If your application works with sessions then you must ensure that all your “Cache-Control” headers set the “private” flag so that the turbocache does not cache it.

Note that your app may work with sessions indirectly. Any Rails page which outputs a form will result in Rails setting a CSRF token in the session. Luckily Rails sets “Cache-Control: private” by default.

Performance considerations

The turbocache plays a major role in the performance of Passenger 5. The performance claims we made are with turbocaching enabled. Now that the turbocache has become more conservative, the performance claims still stand, but may require more application modifications than before.

By strictly following the caching standards, we had hoped that we’d be able to deliver performance for most apps out-of-the-box without modifications. But we also value security, and we value that more so than performance, which is why we made the turbocache more conservative.

But this raises a question: suppose that we can’t rely on the turbocache anymore as a major factor in performance, what else can we do to improve Passenger 5’s performance, and is it worth it?

There is in fact one more thing we can do. and that is by introducing a new operational mode which bypasses a layer. In the current Passenger 5 architecture, all requests go through a process called the HelperAgent. This extra process introduces some context switching overhead and processing overhead. The overhead is negligible in most real-world scenarios, but the overhead is very apparent in hello world benchmarks. It is possible to bypass this process, allowing clients to directly communicate with application processes. But doing so comes at a cost:

Some features — e.g. load balancing, multitenancy, out-of-band garbage collection, memory checking and statistics collection — are best implemented in the HelperAgent. Reimplementing them inside application processes is either difficult or less efficient.

Rearchitecting Passenger this way requires a non-trivial amount of development time.

So it remains to be seen whether bypassing the HelperAgent is worth it. Investing time in this means that we’ll have less time available to pursue other goals on the roadmap. What do you think about this? Just post a comment and let us know.

Conclusion

Passenger 5 beta 3 is the first “more or less usable in production” release of the Passenger 5 series. Previous releases were not ready for production, but with beta 3 we are confident enough that the most important issues are solved. The turbocache issue as described in this article is one of them. If you were on a previous Passenger 5 release, then we strongly recommend you to upgrade.

We would like to thank Chris Heald for his excellent feedback. We couldn’t have done this without him.

We expect the Passenger 5 final stable version to be released in February. Beta 3 is supposed to be the last beta. Next up will be Release Candidate 1, followed by the final stable release.

But the story doesn’t end there. We will publish a roadmap in the near future, which describes all the ambitious plans we have in store for Passenger. Passenger is constantly improving and evolving, so please stay tuned for updates.

We’ve just released version 5.0.0 beta 3 of the Phusion Passenger application server for Ruby, Python and Node.js. The 5.x series is also unofficially known under the codename “Raptor”, and introduces many major improvements such as performance enhancements, better tools for application-level visibility and a new HTTP JSON API for accessing Passenger’s internals.

So far, we’ve discouraged using 5.0 in production because it’s still in beta and because it was known to be unstable. But this changes with beta 3, which we consider “more or less ready for production”. This means that we’re confident that most of the major issues have been solved, but you should still exercise caution if you roll it out to production.

Final stable

In February, we will release the first Release Candidate version. If everything goes well, 5.0 final — which is officially ready for production — will also be released in February.

Changes in this version

Turbocaching updates

One of the major features in Phusion Passenger 5 is the turbocache, an integrated and high-performance HTTP response cache. It is responsible for a large part of the performance improvements in version 5. In beta 3, we’ve given the turbocache a few major updates:

We’ve been researching ways to improve the turbocache. Based on community feedback on the turbocache, we’ve found that the turbocache in its previous form wasn’t so useful. So we’ve come up with a few ways to allow apps to be better cacheable. These techniques are well-established and have been extensively used in advanced Varnish setups.

We’ve made the turbocache more secure and more conservative, based on excellent feedback from Chris Heald and the community. In previous versions, default cacheable responses (as defined by RFC 7234) were cached unless caching headers tell us not to. Now, default cacheable responses are only cached if caching headers explicitly tell us to. This change was introduced because there are many applications that set incorrect caching headers on private responses. This new behavior is currently not configurable, but there are plans to make it configurable in 5.0.0 release candidate 1.

Miscellaneous

A new configuration option, passenger_response_buffer_high_watermark (Nginx) and PassengerResponseBufferHighWatermark (Apache), has been introduced. This allows for configuring the behavior of the response buffering system. Closes GH-1300.

State introspection has been improved. This means that passenger-status --show=requests shows better and more detailed output now.

Installing or upgrading

The above is a very short excerpt of how to install or upgrade Phusion Passenger. For detailed instructions (which, for example, take users and permissions into account), please refer to the “RubyGems” section of the installation manuals:

Debian packages no longer require Ruby 1.9

Due to a bug in the package specifications, Debian packages used to require Ruby 1.9, even if you already have newer Ruby versions installed from APT (e.g. through the Brightbox repository). This bug has now been fixed. The Phusion Passenger Debian packages now require some Ruby interpreter, but it doesn’t care which version.

It’s been a while since we released the first beta of Phusion Passenger 5 (codename “Raptor”), the application server for Ruby, Python and Node.js web apps. We have received a lot of great feedback from the community regarding its performance, stability and features.

Passenger 5 isn’t production-ready yet, but we are getting close because 5.0 beta 3 will soon be released. But in the mean time, we would like to share a major new idea with you.

While Passenger 5 introduced many performance optimizations and is much faster than Passenger 4, the impact on real-world application performance varies greatly per application. This is because in many cases the overall performance is more dependent on the application than on the app server.

It’s obvious that just making the app server itself fast is not enough to improve overall performance. So what else can the app server do? After contemplating this question for some time, we believe we have found an answer in the form of a modified HTTP caching mechanism. Its potential is huge.

Update: Over the course of the day, readers have made us aware that some of the functionality can also be achieved through Varnish and through the use of Edge Side Includes, but there are also some ideas which cannot be achieved using only Varnish. These ideas require support from the app. Please read this article until the end before drawing conclusions.

Please also note that the point of this article is not to show we can “beat” Varnish. The point is to share our ideas with the community, to have a discussion about these ideas and to explore the possibilities and feasibility.

Turbocaching

One of the main new features in Passenger 5 is turbocaching. This is an HTTP cache built directly in Passenger so that it can achieve much higher performance than external HTTP caches like Varnish (update: no, we’re not claiming to be better than Varnish). It is fast and small, specifically designed to handle large amounts of traffic to a limited number of end points. For that reason, we described it as a “CPU L1 cache for the web”.

Turbocaching is a major contributor of Passenger 5’s performance

The turbocache has the potential to improve app performance dramatically, no matter how much work the app does. This is seen in the chart above. A peculiar property is that the relative speedup is inversely proportional to the app’s native performance. That is, the slower your app is, the bigger the speedup multiplier you can get from caching. At worst, caching does not hurt. In extreme cases — if the app is really slow — you can see a hundred fold performance improvement.

The limits of caching

So far for the potential of caching, but reality is more nuanced. We have received a lot of feedback from the community about the Passenger 5 beta, including feedback about its turbocache.

As expected, the turbocache performs extremely well in applications that serve data that is publicly cacheable by everyone, i.e. they do not serve data that is login-specific. This includes blogs and other sites that consist mostly of static content. The Phusion Passenger website itself is also an example of a mostly static site. But needless to say, this still makes the turbocache’s usefulness rather limited. Most sites serve some login-specific data, even if it’s just a navigation bar displaying the username.

Even the CloudFlare CDN — which is essentially a geographically distributed HTTP cache — does not help a lot with logged-in traffic. Although CloudFlare can reduce the bandwidth between the origin server and the cache server through its Railgun technology, it doesn’t reduce the load on the origin server, which is what we are after.

Update: some readers have pointed out that Varnish supports Edge Side Include (ESI), which is like a text postprocessor at the web server/cache level. But using ESI only solves half of the problem. Read on for more information.

A glimmer of hope

Hope is not all lost though. We have identified two classes of apps for which there is hope:

Apps which have more anonymous traffic than logged in traffic. Examples of such apps include Ted.com, Wikipedia, Imgur, blogs, news sites, video sites, etc. Let’s call these mostly-anonymous apps. What if we can cache responses by user, so that anonymous users share a single cache entry?

Apps which serve public data for the most part. Examples of such apps include: Twitter, Reddit, Discourse, discussion forums. Let’s call these mostly-public apps. Most of the data that they serve is the same for everyone. There are only minor variations, e.g. the a navigation bar that displays the username, and secured pages. What if we can cache the cacheable content, and skip the rest?

Class 1: caching mostly-anonymous apps

There is almost a perfect solution for making apps in the first class cacheable: the HTTP Vary header. This header allows you to send a different cached response, based on the value of some header that is sent by the client.

For example, suppose that your app…

…serves gzip-compressed responses to browsers that support gzip compression.

…serves regular responses to browsers that don’t support gzip compression.

You don’t want a cache to serve gzipped responses to browsers that don’t support gzip. Browsers tell the server whether they support gzip by sending the Accept-Encoding: gzip header. If the application sets the Vary: Accept-Encoding header in its responses, then the cache will know that that particular response should only be served to clients with the particular Accept-Encoding value that it has received now.

The Vary response header makes HTTP caches serve different cached responses based on the headers the browsers send.

In theory, we would be able to cache responses differently based on cookies (Vary: Cookie). Each logged in user would get its own cached version. And because most traffic is anonymous, all anonymous users can share cache entries.

Unfortunately, on the modern web, cookies are not only set by the main site, but also by third-party services which the site uses. This includes Google Analytics, Youtube and Twitter share buttons. The values of their cookies can change very often and often differ on a per-user basis, probably for the purpose of user tracking. Because these widely different values are also included in the cache variation key, they make it impossible for anonymous users to share cache entries if we were to try to vary the cache by the Cookie header. The situation is so bad that Varnish has decided not to cache any requests containing cookies by default.

Even using Edge Side Include doesn’t seem to help here. The value of the cookie header can change quickly even for the same user, so when using Edge Side Include the cache may not even be able to cache the previous user-specific response.

The eureka moment: modifying Vary

While the Vary header is almost useless in practice, the idea of varying isn’t so bad. What we actually want is to vary the cache by user, not by the raw cookie value. What if the cache can parse cookies and vary the cached response by the value of a specific cookie, not the entire header?

And this is exactly what we are researching for Passenger 5 beta 3. Initial tests with a real-world application — the Discourse forum software — show promising results. Discourse is written in Ruby. We have modified Discourse to set a user_id cookie on login.

The result is a Discourse where all anonymous users share the same cache entry. Uncached, Discourse performance is pretty constant at 97 req/sec no matter which app server you use. But with turbocaching, performance is 19 000 req/sec.

This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done*. The benefit that turbocaching adds in this scenario is exactly in line with our vision of a “CPU L1 cache” for the web. You can still throw in Varnish for extra caching on top of Passenger’s turbocaching, but Passenger’s turbocaching’s provides an irreplaceable service.

* Maybe Varnish’s VCL allows it, but we have not been able to find a way so far. If we’re wrong, please let us know in the comments section.

Class 2: caching mostly-public apps

Apps that serve pages where most data is publicly cacheable, except for small fragments, appear not to be cacheable at the HTTP level at all. Currently these apps utilize caching at the application level, e.g. using Rails fragment caching or Redis. View rendering typically follows this sort of pseudo-algorithm:

However, this still means the request has to go through the application. If there is a way to cache this at the Passenger level then we can omit the entire application, boosting the performance even further.

We’ve come to the realization that this is possible, if we change the app into a “semi single page app”:

Instead of rendering pages on the server side, render them on the client side, e.g. using Ember. This way, the view templates can be simple static HTML files, which are easily HTTP cacheable.

PushState is then used to manipulate the location bar, making it feel like a regular server-side web app.

The templates are populated using JSON data from the server. We can categorize this JSON data in two categories:

User-independent JSON data, which is HTTP-level cacheable. For example, the list of subforums.

User-specific JSON data, which is not HTTP-level cacheable. For example, information about the logged in user, such as the username and profile information.

And here lies the trick: we only load this data once, when the user loads the page. When the user clicks on any links, instead of letting the browser navigate there, the Javascript loads the user-independent JSON data (which is easily cacheable), updates the views and updates the location bar using PushState.

By using this approach, we reduce the performance impact of non-cacheable fragments tremendously. Normally, non-cacheable page fragments would make every page uncacheable. But by using the approach we described, you would only pay the uncacheability penalty once, during the initial page load. Any further requests are fully cacheable.

And because of the use of HTML PushState, each page has a well-defined URL. This means that, despite the app being a semi-single-page app, it’s indexable by crawlers as long as they support Javascript. GoogleBot supports Javascript.

Discourse is a perfect example of an app that’s already architected this way. Discourse displays the typical “navigation bar with username”, but this is only populated on the first page load. When the user clicks on any of the links, Discourse queries JSON from the server and updates the views, but does not update the navbar username.

An alternative to this semi-single page app approach is by using Edge Side Include technology, but adoption of the technology is fairly low at this point. Most developers don’t run Varnish in their development environment. In any case, ESI doesn’t solve the whole problem: just half of it. Passenger’s cookie varying turbocaching feature is still necessary.

Even when there are some protected/secured subforums, the turbocache cookie varying feature is powerful enough make even this scenario cacheable. Suppose that the Discourse content depends on the user’s access level, and that there are 3 access levels: anonymous users, regular registered members, staff. You can put the access level in a cookie, and vary the cache by that:

That way, all users with the same access level share the same cache entry.

Due to time constraints we have not yet fully researched modifying Discourse this way, but that leads us to the following point.

Call for help: please participate in our research

The concepts we proposed in this blog post are ideas. Until tested in practice, they remain theory. This is why we are looking for people willing to participate in this research. We want to test these ideas in real-world applications, and we want to look for further ways to improve the turbocache’s usefulness.

Participation means:

Implementing the changes necessary to make your app turbo-cache friendly.

Benchmarking or testing whether performance has improved, and by how much.

Actively working with Phusion to test ideas and to look for further room for improvements. We will happily assist active participants should they need any help.

If you are interested, please send us an email at info@phusion.nl and let’s talk.

Also, if you liked this article then maybe you would be interested in our newsletter. It’s low volume, but we regularly post interested updates there. Just enter your email address and name. No spam, we promise.

Phusion Passenger 4 is the current stable branch, in which we release bug fixes from time to time. At the same time there is also Phusion Passenger 5, which is the not-yet-ready-for-production development branch, with major changes and improvements and terms of performance application behavior visibility. Version 5.0 beta 3 will soon be released, but until the 5.x branch is considered stable, we will keep releasing bug fixes under the 4.x branch.

Improved Ruby 2.2 support

Version 4.0.56 already introduced Ruby 2.2 support, but due to an issue in the way we compile the Phusion Passenger native extension, it didn’t work with all Ruby 2.2 installations. In particular, 4.0.56 worked with Ruby 2.2 installations that were compiled with a shared libruby, which is the case if you installed Ruby 2.2 with RVM or though operating system packages. But it did not work with Ruby 2.2 installations that were compiled with a static libruby, which is the case if you installed manually from source, or using rbenv and chruby, or when you are using Heroku.

At first, we suspected a bug in Ruby 2.2’s build system, but after feedback from the MRI core developers, it turned out to be an issue in our own build system. The issue is caused by a commit from 4 years ago, GH-168, which attempted to fix a different issue. It seems there is no way to fix Ruby 2.2 compatibility while at the same time fixing GH-168, so we had to make a choice. Since GH-168 is quite old and was made at a time when Ruby 1.8.6 was the latest Ruby version, we believe that the issue is no longer relevant. We reverted GH-168 in favor of Ruby 2.2 compatibility.

Installing or upgrading

The above is a very short excerpt of how to install or upgrade Phusion Passenger. For detailed instructions (which, for example, take users and permissions into account), please refer to the “RubyGems” section of the installation manuals:

We’ve just released version 4.0.56 of the Phusion Passenger application server for Ruby, Python and Node.js, which fixes a number of interesting and important bugs. They’re the kind of typical bugs that make me go “what the **** was I thinking?!” after I’ve analyzed them, because the fixes are very simple.

Leaking file descriptors

The first bug is a file descriptor leak. A file descriptor is number which represents a kernel resource, such as an open file or a socket. Every time you call File.open or TCPSocket.new in Ruby, you get an IO object that’s internally backed by a file descriptor. In Node.js you even often work with file descriptors directly: most fs functions return a file descriptor. Since Phusion Passenger is written in C++, we also work with file descriptors directly.

Schematically, it looks like this:

You’re probably familiar with memory leaks. A file descriptor leak is very similar. If you ever lose track of a file descriptor number, you’ve leaked it. In Ruby this is not possible because all IO objects own their file descriptor, and IO objects are garbage collected (thus closing the corresponding file descriptor). However in Node.js and in C++ this can easily happen if you’re not careful. When leaked, the kernel resource stays allocated until your process exits.

What went wrong

In Passenger, we leaked a file descriptor when creating an error report file. This file is created if your app can’t spawn for some reason (e.g. it throws an exception during startup). The code that was responsible for rendering the file looked like this, in semi C++ pseudocode:

Notice the guard variable. In C++, it is a so-called RAII object: “Resource Acquisition Is Initialization”. It is a common coding pattern in C++ to ensure that things are cleaned up when exceptions are thrown, kind of like the C++ equivalent of the ensure keyword in Ruby or the finally keyword in Javascript. When this function exits for any reason, be it a normal return or an exception, the guard destructor is called, which is supposed to close the file descriptor.

The facepalm moment was when Paul “popox” B reported that the guard was on the wrong line. The guard was created before the file descriptor was assigned to fd, so the guard did nothing all this time. Every time a report file was created, a file descriptor was leaked.

Node.js load balancing

The other issue fixed in 4.0.56 is a Node.js load balancing issue. In Passenger we load balance requests between application processes as much as possible. Traditionally, the reason for load balancing has been to minimize latency. This utilizes the concept of “application concurrency”: the maximum number of concurrent requests a single app process can handle. For Ruby apps, the concurrency is 1 (unless you configured multithreading, in which case the concurrency is equal to the number of threads). Since Ruby apps have finite I/O concurrency, Passenger load balances a request to a different process only if one process has run out of concurrency.

Node.js is different in that it’s fully asynchronous. It can effectively have an unlimited amount of concurrency.

Passenger orders processes in a priority queue by “busyness”. Load balancing is achieved by routing a new request to the process with the least busyness.

What went wrong

What went wrong with the Node.js case is the fact that we had special rules for application processes with unlimited concurrency. The busyness for such processes is calculated as follows:

if (sessions == 0) {
return 0;
} else {
return 1;
}

sessions indicates the number of requests that a process is currently handling. This piece of code effectively sorted Node.js processes in two categories only: idle processes and non-idle processes.

From a concurrency point of view, there is nothing wrong with this. Node.js apps have unlimited concurrency after all. However this resulted in lots of requests “sticking” to a few processes, as Charles Vallières reported:

Then it dawned to me that I forgot something. An even distribution of requests is desirable here, because now the reason for load balancing becomes different. It’s to maximize CPU core usage, because single Node.js process can only use 1 CPU core.

Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.55 is a bugfix release.

Executive summary

Phusion Passenger is an app server that supports Ruby. We have released version 5 beta 1, codename “Raptor”. This new version is much faster, helps you better identify and solve problems, and has a ton of other improvements. If you’ve followed the Raptor campaign then you may wonder why we held the campaign like this. Read on if you’re interested. If you just want to try it, scroll all the way down for the changelog and the installation and upgrade instructions.

Introduction

A month ago, we released a website in which we announced “Raptor”, supposedly a new Ruby app server that’s much faster than others. It has immediately received a lot of community attention. It was covered by Fabio Akita and by RubyInside’s Peter Cooper. From the beginning, “Raptor” was Phusion Passenger 5, a new major version with major internal overhauls. In the weeks that followed, we blogged about how we made “Raptor” fast.

Even though “Raptor” is Phusion Passenger, it doesn’t make the impact any less powerful. The performance improvements that we claim are real, and they are open source. Because “Raptor” is Phusion Passenger 5, it means that it automatically has a mature set of features:

Handle more traffic
Phusion Passenger 5 is up to 4x faster than other Ruby app servers, allowing you to handle more traffic with the same hardware.

Reduce maintenance
Automates more system tasks than other app servers. Spend less time micromanaging software, and more time building your business.

Identify & fix problems quickly
Why is your app behaving the way it does? What is it doing? Phusion Passenger 5 provides tools that give you the insights you need.

Keep bugs & issues in check
Limit the impact of bugs and issues, making downtime and user dissatisfaction less likely. Reduce pressure on developers while the root problem is being fixed.

Excellent support
We have excellent documentation and a vibrant community discussion forum. And with our professional and enterprise support contracts, you can consult our team of experts directly.

However, the authors behind “Raptor” remained unknown — until today. The reason why we ran the campaign like this is explained in this article.

A brief history

It is perhaps hard to fathom now, but in the early days of Ruby, getting an app into a production environment was a painful task in itself. Many hours were spent by developers on tedious tasks such as manually managing ports and performing other error-prone configuration sit-ups. The status quo of deployment back then wasn’t exactly in line with what Rails advocates through its “convention over configuration” mantra. Far from it in fact.

When we first introduced Phusion Passenger back in 2008, we wanted to “fix” this. We wanted Ruby deployment to be as easy as PHP so that developers could focus on their apps and lower the barrier of entry for newcomers.

Even though we have been able to help power some of the largest sites on the Internet over the past few years through Phusion Passenger, we have always remained vigilant as to not become complacent: we have been eagerly listening to the community as to what they expect the next big thing to be.

The observations we have made over the years have eventually culminated into Phusion Passenger 5, which was codenamed Raptor for a number of reasons.

A greater focus on performance and efficiency

Whether you are deploying a small web app on a VPS or spinning up tens of thousands of instances to power your e-commerce business, we all want the most bang for our buck. Being able to reduce the number of required servers would be beneficial in reducing costs and it is for this reason that developers seek to employ the most efficient software stack currently available. When it comes to making that choice, benchmarks from third parties often seem to play an important part in the decision making process. Even though they are convenient to consult, it is easy to overlook a couple of important things that we would like to underline.

When it comes to performance benchmarks for example, it does not always become clear how the results have been obtained and how they will affect the reader. This is mostly due to the fact that benchmarks are often performed on synthetic applications in synthetic environments that don’t take into consideration real world workloads and latencies. This often leads to skewed results when compared to real time workloads.

A good example of this is the “Hello World” benchmark, where people tend to benchmark app servers against a so-called “Hello World” application: an app that basically returns “Hello World”. Needless to say, this is hardly a real world application.

Anyone who has ever deployed a real world application will know that these kinds of benchmarks don’t really say anything useful as they effectively measure how fast an app server is at “doing nothing”.

In real world Ruby applications on the other hand, processing time quickly gets overtaken by the app itself, plus network overhead, rather than the app server: the differences in performance between the app servers basically become insignificant when compared to the time spent in the hosted app itself.

Despite this, benchmarks remain a popular source to consult when trying to figure out what the “best” software solution is. The danger here lies in the possibility that a developer might be tempted to base their decision solely on what bar chart sticks out the most, without so much as looking at what the other solutions all bring to the table.

There is a reason for example why Phusion Passenger — even though hot on the heels of its competitors — has not been leading these kinds of Hello World benchmarks in the past: we do much more than the competition does when it comes to ease of use, memory efficiency, security and features. All these things are not free, and this is basically what is being measured with said benchmarks.

When benchmarked against real world applications however, the differences in performance between app servers becomes almost indistinguishable. The focus should then be put on what feature-set is most suitable for your production app. We believe that on that front, Phusion Passenger is leading the pack.

We have tried many times to explain this in the comment sections of benchmarks, but have unfortunately had to infer that such explanations often fall on deaf ears. We think that’s a shame, but rather than continue to fight it, we have decided to try to beat our competitors on performance as well. This has led to a series of internal optimizations and innovations which we have documented in the Raptor articles. Not only do we believe we are now able to win these kinds of benchmarks, we believe we have been able to do so with mechanisms that are incredibly useful in real world scenarios too (e.g. Turbocaching).

A greater focus on showcasing the technology

Software that is easy to use runs the risk of being considered “boring” to hackers, or worse, “simple”. In the latter case, the underlying technology facilitating the ease of use gets taken granted for. Over the years, we felt this was happening to Phusion Passenger to the extent that we wanted to set the record straight.

A lot of thought went into using the right algorithms and applying the right optimizations to allow Phusion Passenger to do what it does best: being able to deliver an “upload-and-go” deployment experience second to none in a secure and performant manner is by no means a trivial task. We chose to abstract these implementation details however from the end-user as we wanted them to be able to focus more on their app and business rather than the nitty gritty when it came down to how “the soup was made”. Who cares right?

Well, as it turned out, a lot of hackers do. Articles about “Unicorn being Unix” sparked a lot of interest from hackers allowing it to quickly garner a following. We thought articles such as these were great, but felt somewhat disappointed that people seemed to forget that Phusion Passenger was already doing the majority of what was written in such articles a full year earlier. It then dawned on us that we were no longer being considered to be the new shiny thing, but rather considered being part of the establishment.

In hindsight, it was perhaps also an error of judgement of us to focus our marketing efforts mostly on businesses rather than the grassroots hacker community we originated from ourselves: they are not mutually exclusive and instead of mostly underlining the business advantages, we should have underlined the technological advantages much more as well.

Besides the new optimizations and features found in Raptor, a lot of technology discussed in those articles was already available in Phusion Passenger 4 and its precursors. If there is a lesson to take from all this, it is that marketing is indeed the art of repetition. And that it probably helps to present it as the new kid on the block to get rid of any preconceived notions/misconceptions people might have had about Phusion Passenger.

Smoke and mirrors

Whether or not people would be just as excited if they knew that it was Phusion Passenger 5 all along is perhaps another discussion to be had: some actually found out ahead of time due to the similar writing style of our tech articles and / or through nslookups (a fedora hat-tip goes out to you folks! May your sense of scrutiny live long and prosper!).

What we do however know is that our Raptor approach over the past month has produced more subscribers to our newsletter than we have been able to accomplish over the past 6 years through the Phusion Passenger moniker. We still have a hard time comprehending this, but there is no denying the numbers: we — the community — seem to like shiny new things.

Truth be told, we didn’t really bother trying to cover up the fact that it was in fact Phusion all along that was behind Raptor. We kind of left it as an exercise to the reader to figure this out amidst all the “hype” and claims of “vaporware” to see if people still remembered Phusion Passenger’s fortes.

We were not disappointed when it came to that and felt incredibly proud that a lot of people questioned why Phusion Passenger was not included within the Raptor benchmarks and requested it to be included. Needless to say, this was something we were unable to do because it already was included all along as Raptor itself 😉 We were also happy to see that some even pointed out that some of the features look like they came straight out of Phusion Passenger.

What’s in a name?

You might be wondering why we chose to market Phusion Passenger 5 under the Raptor code name and went through so many hoops in doing so. To quote Shakespeare: “Conceal me what I am, and be my aid, for a disguise as haply shall become, the form of my intent”.

With all the new improvements and features pertaining to performance and introspection, we felt Phusion Passenger deserved new consideration from its audience in an objective manner. To circumvent any preconceived notions and/or misconceptions people may have had about Phusion Passenger over the years, we decided to market it as Raptor. We felt the codename was particularly appropriate for our renewed commitment to performance.

Just to be clear, Phusion Passenger will not be renamed to Raptor. Raptor is just a codename that has served its purpose by the time of this writing for Phusion Passenger 5. We will drop this name starting from today: from now on, “Raptor” is Phusion Passenger 5.

With a little help from our friends

The success of the Raptor campaign would not have been possible without the help and support of our friends. In particular, we would like to thank Peter Cooper and Fabio Akita for their in-depth write-ups on Phusion Passenger 5 / Raptor and its precursors. Their articles carried the necessary weight to allow us to focus on explaining and improving the technology itself rather than having to spend time on trying to debunk “vaporware” claims. In a similar manner, we would also like to thank David Heinemeier Hansson for helping us out via his tweets and feedback.

Lastly, we would like to thank the community and our customers for being so supportive. At the end of the day, it is you folks who make this all possible and we can’t wait to show you what we have in store for the future.

Support for Rails 1.2 – 2.2 has been removed, for performance reasons. Rails 2.3 is still supported.

Phusion Passenger now supports integrated HTTP caching, which we call turbocaching. If your app sets the right HTTP headers then Phusion Passenger can tremendously accelerate your app. It is enabled by default, but you can disable it with --disable-turbocaching (Standalone), PassengerTurbocaching off (Apache), or passenger_turbocaching off (Nginx).

Touching restart.txt will no longer restart your app immediately. This is because, for performance reasons, the stat throttle rate now defaults to 10. You can still get back the old behavior by setting PassengerStatThrottleRate 0 (Apache) or passenger_stat_throttle_rate 0 (Nginx), but this is not encouraged. Instead, we encourage you to use the passenger-config restart-app tool to initiate restarts, which has immediate effect.

Websockets are now properly disconnected on application restarts.

The Phusion Passenger log levels have been completely revamped. If you were setting a log level before (e.g. through passenger_log_level), please read the latest documentation to learn about the new log levels.

If you use out-of-band garbage collection, beware that the X-Passenger-Request-OOB-Work header has now been renamed to !~Request-OOB-Work.

When using Rack’s full socket hijacking, you must now output an HTTP status line.

[Nginx] The passenger_set_cgi_param option has been removed and replaced by passenger_set_header and passenger_env_var.

[Nginx] passenger_show_version_in_header is now only valid in the http context.

[Apache] The PassengerStatThrottleRate option is now global.

Minor changes:

The minimum required Nginx version is now 1.6.0.

The instance directory is now touched every hour instead of every 6 hours. This should hopefully prevent more problems with /tmp cleaner daemons.

Applications are not grouped not only on the application root path, but also on the environment. For example, this allows you to run the same app in both production and staging mode, with only a single directory, without further configuration. Closes GH-664.

The passenger_temp_dir option (Nginx) and the PassengerTempDir option (Apache) have been replaced by two config options. On Nginx they are passenger_instance_registry_dir and passenger_data_buffer_dir. On Apache they are PassengerInstanceRegistryDir and PassengerDataBufferDir. On Apache, PassengerUploadBufferDir has been replaced by PassengerDataBufferDir.

Command line tools no longer respect the PASSENGER_TEMP_DIR environment variable. Use PASSENGER_INSTANCE_REGISTRY_DIR instead.

passenger-status --show=requests has been deprecated in favor of passenger-status --show=connections.

Using the SIGUSR1 signal to restart a Ruby app without dropping connections, is no longer supported. Instead, use passenger-config detach-process.

Introduced the passenger-config reopen-logs command, which instructs all Phusion Passenger agent processes to reopen their log files. You should call this after having rotated the web server logs.

Installing or upgrading

The above is a very short excerpt of how to install or upgrade Phusion Passenger. For detailed instructions (which, for example, take users and permissions into account), please refer to the “RubyGems” section of the installation manuals:

Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.53 is a bugfix release.

“Phusion” and “Phusion Passenger” are registered trademarks of Phusion. “Rails”, “Ruby on Rails” and the Rails logo are registered trademarks of David Heinemeier Hansson. All other trademarks are property of their respective owners.