Traveling Ruby allows you to create self-contained, “portable” Ruby binaries which can run on any Windows machine, any Linux distribution and any OS X machine. This allows Ruby app developers to distribute a single package to end users, without needing end users to first install Ruby or gems.

We’ve released version 20150210, which marks a major milestone. It introduces Windows support, and support for Ruby 2.2.

Backstory

There’s a little bit of a backstory behind this release.

Last week I went to Amsterdam.rb’s MRI Implementors Panel meetup featuring Koichi Sasada, Terence Lee and Zachary Scott. Distribution of Ruby apps came up at some point. Terence and Koichi talked about the fact that MRuby — an alternative Ruby implementation that Yukihiro Matsumoto is working on — is able to precompile Ruby apps into a single self-contained binary with no dependencies. They asserted that the ability to produce such a binary is the reason why so many people are moving to Go: it makes deployment and distribution much easier. But I was unconvinced about the value of MRuby, because MRuby is a subset of Ruby. In my opinion, Ruby’s main power lies in its standard library and its rich ecosystem.

So I talked with Terence and with Eloy Durán (MacRuby/RubyMotion) about Traveling Ruby. Traveling Ruby is different in that it’s just a normal MRI Ruby interpreter, and not a subset of the language with limitations. You can even use many native extensions. However, Terence and Eloy were equally unconvinced, asserting that Traveling Ruby doesn’t support Windows, and can’t produce a single binary. Traveling Ruby currently generates a self-contained directory with no dependencies. This directory contains a single wrapper script plus a lib subdirectory that contains the bulk of the app, so it looks almost like a single self-contained binary. But that’s not good enough for them.

Then we contemplated how sad it is that so many parties are moving off Ruby towards Go.

So I implemented Windows support last weekend. This is just a minimum viable product: at present, there are lots of caveats and limitations in the Windows support. In a future release, I plan on introducing the ability to produce a single self-contained binary.

Although some people within Phusion are big Go fans, I am personally a big Ruby fan. I am in love with Ruby’s simplicity, elegance and productivity. There are high-quality libraries for almost every task imagineable. The only things Ruby isn’t good at are tasks which require very low memory consumption or very high performance, but most of the problems I’m solving do not require either. So with the continued development of Traveling Ruby, I am hoping that I can prevent more people from switching to Go due to distribution issues, or even switch some people back.

Changes

Ruby 2.2.0

Previous Traveling Ruby versions used Ruby 2.1.5. But we now also support Ruby 2.2.0 in addition to 2.1.5. In the future we may drop support for 2.1.5, but for now both versions are supported at the same time. You may be interested in Ruby 2.2 because of the better garbage collector and performance characteristics.

Windows support

We now support creating Windows packages. But there are currently a number of caveats:

Traveling Ruby supports creating packages for Windows, but it does not yet support creating packages on Windows. That is, the Traveling Ruby tutorials and the documentation do not work when you are a Ruby developer on Windows. To create Windows packages, you must use OS X or Linux.

This is because in our documentation we make heavy use of standard Unix tools. Tools which are not available on Windows. In the future we may replace the use of such tools with Ruby tools so that the documentation works on Windows too.

Only Ruby 2.1.5 is supported for Windows, not 2.2.0. This is because the RubyInstaller project hasn’t released Ruby 2.2.0 binaries yet.

Gem upgrades and other changes

Fixed a problem with the ‘rugged’ native extension on Linux. Closes GH-33.

Fixed a problem with the ‘charlock_holmes’ native extension on Linux. Closes GH-34.

Header files are no longer packaged. This saves 256 KB.

RDoc and various unnecessary Bundler files have been removed. This saves about 1.2 MB.

Phusion Passenger version 5.0.0 beta 3 was released today, with a number of changes to the turbocache. The turbocache is a component in Passenger 5 that automatically caches HTTP responses in an effort to speed up the application. The turbocache is not a feature-rich HTTP cache like Varnish, but instead it’s more like a “CPU L1 cache for the web” — small and fast, requires little configuration, but has fewer features by design.

The turbocache was implemented with HTTP caching standards in mind, namely RFC 7234 (HTTP 1.1 Caching) and RFC 2109 (HTTP State Management Mechanism). Our initial mindset in implementing the turbocache was like that of compiler implementors: if something is allowed by the standards, then we’ll implement it in order pursue maximum possible performance.

But it turns out that following the standards strictly may raise security concerns. A while ago, we were contacted by Chris Heald, who provided convincing cases on why the turbocache’s behavior is problematic from a security standpoint. Chris is a veteran Varnish user and has a lot of experience on the subject of HTTP caching.

The gist is that the turbocache made it too easy to accidentally cache responses which should not be cached. Imagine that a certain response contains sensitive information and is only meant to be served to one user. Such sensitive information may include security credentials or session cookies. If the application sets HTTP caching headers incorrectly, then the turbocache may serve that response to other users, resulting in an information leak.

Such problems are technically bugs in the application. After all, Passenger is just following the standards. But Chris asserted that such mistakes are very easily made. Indeed, he has seen quite a number of applications that send out incorrect HTTP caching headers. For this reason, he believes that the turbocache should be more conservative by default.

Cookies also deserve special attention. RFC 2109 mentions that cookies may only be cached when they are intended to be shared by multiple users. But it is impossible for Passenger to know the intention of the cookies without configuration from the application developer. At the same time, providing such configuration goes against the spirit of the turbocache, namely that it should be easy and automatic.

It was clear that something had to be done.

Turbocache changes

More conservative

After contemplating the issues, we’ve decided to make the turbocache more conservative in order to avoid the most common security issues.

In beta 1 and beta 2, the turbocache caches all “default cacheable responses”, as defined by RFC 7234. These are responses to GET requests; with status code 200, 203, 204, 300, 301, 400, 405, 410, 414 or 501; and for which no headers are set that prevent caching, e.g. “Cache-Control: no-store”. This means that a GET request which yields a 200 response with no caching headers, is actually cacheable per the standards, and so it was cached by the turbocache.

In beta 3, responses are no longer cached unless the application explicitly sends caching headers. That means that a GET request which yields a 200 response is only cached if the application sends an “Expires” or a “Cache-Control” header (which must not contain “private”, “no-store”, etc).

While it is still possible for some application responses to be unintentionally cached, this change solves the majority of the issues. The only way to avoid these issues out of the box is by disabling caching completely, which has the downside of not being able to leverage said caching features. Instead, we believe it is reasonable that we only cache requests in case the application asks for this explicitly. A potential caveat is that even though the application and Passenger respect the specifications, an application developer may not have a full understanding of how caching behaves according to said specs. A developer should choose to disable caching entirely in such cases.

Bugs fixed

While investigating Chris’s case, we also uncovered some bugs in the turbocache, such as the incorrect handling of certain headers. These bugs have been fixed. The full details can be found in the Passenger 5 beta 3 release notes.

Using the turbocache and speeding up applications

Now that the turbocache has changed in behavior, here are some practical tips on what you can do to make good use of the turbocache.

Learn about HTTP caching headers

The first thing you should do is to learn how to use HTTP caching headers. It’s pretty simple and straightforward. Since the turbocache is just a normal HTTP shared cache, it respects all the HTTP caching rules.

Set an Expires or Cache-Control header

To activate the turbocache, the response must contain either an “Expires” header or a “Cache-Control” header.

The “Expires” header tells the turbocache how long to cache a response. Its value is an HTTP timestamp, e.g. “Thu, 01 Dec 1994 16:00:00 GMT”.

The Cache-Control header is a more advanced header that not only allows you to set the caching time, but also how the cache should behave. The easiest way to use it is to set the max-age flag, which has the same effect as setting “Expires”. For example, this tells the turbocache that the response is cacheable for at most 60 seconds:

Cache-Control: max-age=60

As you can see, a “Cache-Control” header is much easier to generate than an “Expires” header. Furthermore, “Expires” doesn’t work if the visitor’s computer’s clock is wrongly configured, while “Cache-Control” does. This is why we recommend using “Cache-Control”.

Another flag to be aware of is the private flag. This flag tells any shared caches — caches which are meant to store responses for many users — not to cache the response. The turbocache is a shared cache. However, the browser’s cache is not, so the browser can still cache the response. You should set the “private” flag on responses which are meant for a single user, as you will learn later in this article.

And finally, there is the no-store flag, which tells all caches — even the browser’s — not to cache the response.

Here is an example of a response which is cacheable for 60 seconds by the browser’s cache, but not by the turbocache:

Cache-Control: max-age=60,private

The HTTP specification specifies a bunch of other flags, but they’re not relevant for the turbocache.

Only GET requests are cacheable

The turbocache currently only caches GET requests. POST, PUT, DELETE and other requests are never cached. If you want your response to be cacheable by the turbocache, be sure to use GET requests, but also be sure that your request is idempotent.

Avoid using the “Vary” header

The “Vary” header is used to tell caches that the response depends on one or more request headers. But the turbocache does not implement support for the “Vary” header, so if you output a “Vary” header then the turbocache will not cache your response at all. Avoid using the “Vary” header where possible.

Common application caching bugs

Even though the turbocache has become more conservative now, it is still possible for application responses to be unintentionally cached if the application outputs incorrect caching headers. Here are a few tips on preventing common caching bugs.

Varying response by Ajax

A common pattern is to return a different response depending on whether or not it was an Ajax call. For example, consider this Rails app, which returns a JSON response for Ajax calls, HTML response otherwise:

This can cause the turbocache to return the JSON response for non-Ajax calls, or to return the HTML response for Ajax calls.

There are two ways to solve this problem:

Set the “Vary: X-Requested-With” header so that the cache knows the response depends on this header. Rails’s request.xhr? method checks this header. Note that the turbocache currently disables caching altogether upon encountering a “Vary” header.

Do not vary the response based on whether or not it is an Ajax call. Instead, vary the response based on the URI. For example, you can return JSON responses only if the URI ends with “.json”.

Be careful with caching when setting cookies

If your application outputs cookies then you should be careful with your caching headers. You should only allow the turbocache to cache the response if all cookies may be cacheable by multiple users. If any of your cookies contain user-specific information, or if you’re not sure, then you should set the “private” flag in “Cache-Control” to prevent the turbocache from caching it.

Set “Cache-Control: private” when working with sessions

If your application works with sessions then you must ensure that all your “Cache-Control” headers set the “private” flag so that the turbocache does not cache it.

Note that your app may work with sessions indirectly. Any Rails page which outputs a form will result in Rails setting a CSRF token in the session. Luckily Rails sets “Cache-Control: private” by default.

Performance considerations

The turbocache plays a major role in the performance of Passenger 5. The performance claims we made are with turbocaching enabled. Now that the turbocache has become more conservative, the performance claims still stand, but may require more application modifications than before.

By strictly following the caching standards, we had hoped that we’d be able to deliver performance for most apps out-of-the-box without modifications. But we also value security, and we value that more so than performance, which is why we made the turbocache more conservative.

But this raises a question: suppose that we can’t rely on the turbocache anymore as a major factor in performance, what else can we do to improve Passenger 5’s performance, and is it worth it?

There is in fact one more thing we can do. and that is by introducing a new operational mode which bypasses a layer. In the current Passenger 5 architecture, all requests go through a process called the HelperAgent. This extra process introduces some context switching overhead and processing overhead. The overhead is negligible in most real-world scenarios, but the overhead is very apparent in hello world benchmarks. It is possible to bypass this process, allowing clients to directly communicate with application processes. But doing so comes at a cost:

Some features — e.g. load balancing, multitenancy, out-of-band garbage collection, memory checking and statistics collection — are best implemented in the HelperAgent. Reimplementing them inside application processes is either difficult or less efficient.

Rearchitecting Passenger this way requires a non-trivial amount of development time.

So it remains to be seen whether bypassing the HelperAgent is worth it. Investing time in this means that we’ll have less time available to pursue other goals on the roadmap. What do you think about this? Just post a comment and let us know.

Conclusion

Passenger 5 beta 3 is the first “more or less usable in production” release of the Passenger 5 series. Previous releases were not ready for production, but with beta 3 we are confident enough that the most important issues are solved. The turbocache issue as described in this article is one of them. If you were on a previous Passenger 5 release, then we strongly recommend you to upgrade.

We would like to thank Chris Heald for his excellent feedback. We couldn’t have done this without him.

We expect the Passenger 5 final stable version to be released in February. Beta 3 is supposed to be the last beta. Next up will be Release Candidate 1, followed by the final stable release.

But the story doesn’t end there. We will publish a roadmap in the near future, which describes all the ambitious plans we have in store for Passenger. Passenger is constantly improving and evolving, so please stay tuned for updates.

We’ve just released version 5.0.0 beta 3 of the Phusion Passenger application server for Ruby, Python and Node.js. The 5.x series is also unofficially known under the codename “Raptor”, and introduces many major improvements such as performance enhancements, better tools for application-level visibility and a new HTTP JSON API for accessing Passenger’s internals.

So far, we’ve discouraged using 5.0 in production because it’s still in beta and because it was known to be unstable. But this changes with beta 3, which we consider “more or less ready for production”. This means that we’re confident that most of the major issues have been solved, but you should still exercise caution if you roll it out to production.

Final stable

In February, we will release the first Release Candidate version. If everything goes well, 5.0 final — which is officially ready for production — will also be released in February.

Changes in this version

Turbocaching updates

One of the major features in Phusion Passenger 5 is the turbocache, an integrated and high-performance HTTP response cache. It is responsible for a large part of the performance improvements in version 5. In beta 3, we’ve given the turbocache a few major updates:

We’ve been researching ways to improve the turbocache. Based on community feedback on the turbocache, we’ve found that the turbocache in its previous form wasn’t so useful. So we’ve come up with a few ways to allow apps to be better cacheable. These techniques are well-established and have been extensively used in advanced Varnish setups.

We’ve made the turbocache more secure and more conservative, based on excellent feedback from Chris Heald and the community. In previous versions, default cacheable responses (as defined by RFC 7234) were cached unless caching headers tell us not to. Now, default cacheable responses are only cached if caching headers explicitly tell us to. This change was introduced because there are many applications that set incorrect caching headers on private responses. This new behavior is currently not configurable, but there are plans to make it configurable in 5.0.0 release candidate 1.

Miscellaneous

A new configuration option, passenger_response_buffer_high_watermark (Nginx) and PassengerResponseBufferHighWatermark (Apache), has been introduced. This allows for configuring the behavior of the response buffering system. Closes GH-1300.

State introspection has been improved. This means that passenger-status --show=requests shows better and more detailed output now.

Installing or upgrading

The above is a very short excerpt of how to install or upgrade Phusion Passenger. For detailed instructions (which, for example, take users and permissions into account), please refer to the “RubyGems” section of the installation manuals:

Traveling Ruby allows you to create self-contained, “portable” Ruby binaries which can run on any Linux distribution and any OS X machine. This allows Ruby app developers to distribute a single package to end users, without needing end users to first install Ruby or gems.

We’ve released version 20150130, which has some important changes. We advise everyone to upgrade!

OpenSSL upgraded to 1.0.1l.

The builtin OpenSSL library has been upgraded to version 1.0.1l, which fixes some security vulnerabilities. This is why we strongly recommend everyone to upgrade.

Many new native extensions packaged

Many new native extensions have been packaged because they are being used by Elasticrawl and Octodown.

RedCloth

escape_utils

posix-spawn

nokogumbo

github-markdown

rugged

charlock_holmes

unf_ext

CA certificate fixes

The Linux version and the OS X version now use the same CA root certificates. This fixes GH-24.

Future plans

It is planned that this is the last version on Ruby 2.1. The next version is planned to use Ruby 2.2, as discussed on GH-28.

Debian packages no longer require Ruby 1.9

Due to a bug in the package specifications, Debian packages used to require Ruby 1.9, even if you already have newer Ruby versions installed from APT (e.g. through the Brightbox repository). This bug has now been fixed. The Phusion Passenger Debian packages now require some Ruby interpreter, but it doesn’t care which version.

We were early adopters of Docker, using Docker for continuous integration and for building development environments way before Docker hit 1.0. We developed Baseimage-docker in order to solve some problems with the way Docker works, most notably the PID 1 zombie reaping problem.

We figured that:

The problems that we solved are applicable to a lot of people.

Most people are not even aware of these problems, so things can break in unexpected ways (Murphey’s law).

It’s inefficient if everybody has to solve these problems over and over.

So in our spare time we extracted our solution into a reusable base image that everyone can use: Baseimage-docker. We didn’t want to see the community reinventing the wheel over and over. Our solution seems to be well-received: we are the most popular third party image on the Docker Registry, only ranking below the official Ubuntu and CentOS images.

Fat containers, “treating containers as VMs”

Over time, many people got the impression that Baseimage-docker advocates “fat containers”, or “treating containers as VMs”. The Docker developers strongly advocate small, lightweight containers where each container has a single responsibility. The fact that Baseimage-docker advocates the use of multiple processes seems to go against this philosophy.

However, what the Docker developers advocate is running a single logical service per container. Or more generally, a single responsibility per container. Baseimage-docker does not dispute this. Consider that a single logical service can consist of multiple OS processes. Baseimage-docker does not advocate fat containers or treating containers as VMs at all.

Does Baseimage-docker advocate running multiple logical services in a single container? Not necessarily, but we do not prohibit it either. Although the Docker philosophy advocates slim containers, we believe that sometimes it makes sense to run multiple services in a single container, and sometimes it doesn’t.

Why multiple processes?

The most important reason why Baseimage-docker advocates multiple OS processes is because it’s necessary to solve the PID 1 zombie reaping problem. If you’re not familiar with it, you should have a look.

The second reason is that splitting your logical service into multiple OS processes also makes sense from a security standpoint. By running different parts of your service as different processes with different users, you can limit the impact of security vulnerabilities. Baseimage-docker provides tools to encourage running processes as different users, e.g. the setuser tool.

The third reason is to automatically restart processes that have crashed. We saw that a lot of people use Supervisord for this purpose, but Baseimage-docker advocates Runit instead because we think it’s easier to use, more efficient and less resource-hungry. Before Docker 1.2, if your main process crashes then the container is down. With the advent of Docker 1.2 — which introduced automatic restarts of containers — this has reason has become less relevant. However, Runit is still useful for the purpose of running different parts of your service as different users, for security reasons. And sometimes it may make sense to restart only a part of the container instead of the container as a whole.

Baseimage-docker is about freedom

Although following the Docker philosophy is a good thing, we believe that ultimately you should decide what makes sense. We see Docker more as a general-purpose tool, comparable to FreeBSD jails and Solaris zones. Our primary use cases for Docker include:

Continuous integration.

Building portable development environments (e.g. replacing Vagrant for this purpose).

For these reasons, Baseimage-docker was developed to accept the Docker philosophy where possible, but not to enforce it.

How does Baseimage-docker play well with the Docker philosophy?

So when we say that Baseimage-docker modifies Ubuntu for “Docker friendliness” and that it “accepts the Docker philosophy”, what do we mean? Here are a few examples.

Environment variables

Using environment variables to pass parameters to Docker containers is very much in line with “the Docker way”. However, if you use multiple processes inside a container then the original environment variables can quickly get lost. For example, if you use sudo then sudo will nuke all environment variables for security reasons. Other software, like Nginx, nuke environment variables for security reasons too.

“docker logs” integration to become better

Baseimage-docker tries its best to integrate with docker logs where possible. Daemons have the tendency to log to log files or to syslog, but logging to stdout/stderr (which docker logs exposes) is much more in line with the Docker way.

In the next version of Baseimage-docker, we will adhere better to the Docker philosophy by redirecting all syslog output to docker logs.

SSH to be replaced by “docker exec”

Baseimage-docker provides a mechanism to easily login to the container using SSH. This also contributes to why people believe that Baseimage-docker advocates fat containers.

However, fat containers have never been the reason why we include SSH. The rationale was that there should be some way to login to the container for the purpose of debugging, inspection or maintenance. Before Docker 1.4 — which introduced docker exec — there was no mechanism built into Docker for logging into a container or running a command inside a container, so we had to introduce our own.

There are people who advocate that containers should be treated as black boxes. They say that if you have to login to the container, then you’re designing your containers wrong. Baseimage-docker does not dispute this either. SSH is not included because we encourage people to login. SSH is included mainly to handle contingencies. No matter how well you design your containers, if it’s used seriously in production then there will come one day when you have to look inside it in order to debug a problem. Baseimage-docker prepares for that day.

Despite this, the SSH mechanism has been widely criticized. Before Docker 1.4, most critics advocated the use of lxc-attach and nsenter. But lxc-attach soon became obsolete because Docker 0.7 moved away from LXC as backend. Nsenter was a better alternative, but suffered from its own problems, such as the fact that it was not included in most distributions which were widely used back then, as well as the fact that using nsenter requires root access on the Docker (which, depending on your requirements, may or may not be acceptable). Of course, SSH also had its own problems. We knew that there is no one-size-fits-all solution. So instead of replacing SSH with lxc-attach/nsenter, we chose to support both SSH and nsenter, and we clearly documented the pros and cons of each approach.

Docker 1.4 finally introduced the docker exec command. This command is like nsenter; indeed, it appears to be a wrapper around a slightly modified nsenter binary that is included by default with Docker. This is great: it means that for a large number of use cases, neither SSH nor nsenter are necessary. However, some of the issues that are inherent with nsenter are still applicable. For example, running docker exec requires access to the Docker daemon, but users who have access to the Docker daemon effectively have root access.

However, we definitely acknowledge “docker exec” as more in line with “the Docker way”. So in the next version of Baseimage-docker, we will adopt “docker exec” as the default mechanism for logging into a container. But because of the issues in “docker exec”, we will continue to support SSH as an alternative, although it will be disabled by default. And we will continue to clearly document the pros and cons of each approach, so that users can make informed decisions instead of blindly jumping on bandwagons.

Conclusion

Baseimage-docker is not about fat containers or about treating containers as VMs, and the fact that it encourages multiple processes does not go against the Docker philosophy. Furthermore, the Docker philosophy is not binary, but a continuum. So we are even actively developing Baseimage-docker to become increasingly in line with the Docker philosophy.

Is Baseimage-docker the only possible right solution?

Of course not. What Baseimage-docker aims to do is:

To make people aware of several important caveats and pitfalls of Docker containers.

To provide pre-created solutions that others can use, so that people do not have to reinvent solutions for these issues.

This means that multiple solutions are possible, as long as they solve the issues that we describe. You are free to reimplement solutions in C, Go, Ruby or whatever. But why should you when we already have a perfectly fine solution?

Maybe you do not want to use Ubuntu as base image. Maybe you use CentOS. But that does not stop Baseimage-docker from being useful to you. For example, our passenger_rpm_automation project uses CentOS containers. We simply extracted Baseimage-docker’s my_init and imported it there.

So even if you do not use, or do not want to use Baseimage-docker, take a good look at the issues we describe, and think about what you can do to solve them.

If you liked this article then maybe you would be interested in our newsletter. It’s low volume, but we regularly post interested updates there. Just enter your email address and name. No spam, we promise.

When building Docker containers, you should be aware of the PID 1 zombie reaping problem. That problem can cause unexpected and obscure-looking issues when you least expect it. This article explains the PID 1 problem, explains how you can solve it, and presents a pre-built solution that you can use: Baseimage-docker.

We were early adopters of Docker, using Docker for continuous integration and for building development environments way before Docker hit 1.0. We developed Baseimage-docker in order to solve some problems with the way Docker works. For example, Docker does not run processes under a special init process that properly reaps child processes, so that it is possible for the container to end up with zombie processes that cause all sorts of trouble. Docker also does not do anything with syslog so that it’s possible for important messages to get silently swallowed, etcetera.

However, we’ve found that a lot of people have problems understanding the problems that we’re solving. Granted, these are low-level Unix operating system-level mechanisms that few people know about or understand. So in this blog article we will describe the most important problem that we’re solving — the PID 1 problem zombie reaping problem — in detail.

We figured that:

The problems that we solved are applicable to a lot of people.

Most people are not even aware of these problems, so things can break in unexpected ways (Murphy’s law).

It’s inefficient if everybody has to solve these problems over and over.

So in our spare time we extracted our solution into a reusable base image that everyone can use: Baseimage-docker. This image also adds a bunch of useful tools that we believe most Docker image developers would need. We use Baseimage-docker as a base image for all our Docker images.

The community seemed to like what we did: we are the most popular third party image on the Docker Registry, only ranking below the official Ubuntu and CentOS images.

The PID 1 problem: reaping zombies

Recall that Unix processes are ordered in a tree. Each process can spawn child processes, and each process has a parent except for the top-most process.

This top-most process is the init process. It is started by the kernel when you boot your system. This init process is responsible for starting the rest of the system, such as starting the SSH daemon, starting the Docker daemon, starting Apache/Nginx, starting your GUI desktop environment, etc. Each of them may in turn spawn further child processes.

Nothing special so far. But consider what happens if a process terminates. Let’s say that the bash (PID 5) process terminates. It turns into a so-called “defunct process”, also known as a “zombie process”.

Why does this happen? It’s because Unix is designed in such a way that parent processes must explicitly “wait” for child process termination, in order to collect its exit status. The zombie process exists until the parent process has performed this action, using the waitpid() family of system calls. I quote from the man page:

“A child that terminates, but has not been waited for becomes a “zombie”. The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child.”

In every day language, people consider “zombie processes” to be simply runaway processes that cause havoc. But formally speaking — from a Unix operating system point of view — zombie processes have a very specific definition. They are processes that have terminated but have not (yet) been waited for by their parent processes.

Most of the time this is not a problem. The action of calling waitpid() on a child process in order to eliminate its zombie, is called “reaping”. Many applications reap their child processes correctly. In the above example with sshd, if bash terminates then the operating system will send a SIGCHLD signal to sshd to wake it up. Sshd notices this and reaps the child process.

But there is a special case. Suppose the parent process terminates, either intentionally (because the program logic has determined that it should exit), or caused by a user action (e.g. the user killed the process). What happens then to its children? They no longer have a parent process, so they become “orphaned” (this is the actual technical term).

And this is where the init process kicks in. The init process — PID 1 — has a special task. Its task is to “adopt” orphaned child processes (again, this is the actual technical term). This means that the init process becomes the parent of such processes, even though those processes were never created directly by the init process.

Consider Nginx as an example, which daemonizes into the background by default. This works as follows. First, Nginx creates a child process. Second, the original Nginx process exits. Third, the Nginx child process is adopted by the init process.

You may see where I am going. The operating system kernel automatically handles adoption, so this means that the kernel expects the init process to have a special responsibility: the operating system expects the init process to reap adopted children too.

Although I used daemons as an example, this is in no way limited to just daemons. Every time a process exits even though it has child processes, it’s expecting the init process to perform the cleanup later on. This is described in detail in two very good books: Operating System Concepts by Silberschatz et al, and Advanced Programming in the UNIX Environment by Stevens et al.

Why zombie processes are harmful

Why are zombie processes a bad thing, even though they’re terminated processes? Surely the original application memory has already been freed, right? Is it anything more than just an entry that you see in ps?

You’re right, the original application memory has been freed. But the fact that you still see it in ps means that it’s still taking up some kernel resources. I quote the Linux waitpid man page:

“As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.”

Relationship with Docker

So how does this relate to Docker? Well, we see that a lot of people run only one process in their container, and they think that when they run this single process, they’re done. But most likely, this process is not written to behave like a proper init process. That is, instead of properly reaping adopted processes, it’s probably expecting another init process to do that job, and rightly so.

Let’s look at a concrete example. Suppose that your container contains a web server that runs a CGI script that’s written in bash. The CGI script calls grep. Then the web server decides that the CGI script is taking too long and kills the script, but grep is not affected and keeps running. When grep finishes, it becomes a zombie and is adopted by the PID 1 (the web server). The web server doesn’t know about grep, so it doesn’t reap it, and the grep zombie stays in the system.

This problem applies to other situations too. We see that people often create Docker containers for third party applications — let’s say PostgreSQL — and run those applications as the sole process inside the container. You’re running someone elses code, so can you really be sure that those applications don’t spawn processes in such a way that they become zombies later? If you’re running your own code, and you’ve audited all your libraries and all their libraries, then fine. But in the general case you should run a proper init system to prevent problems.

But doesn’t running a full init system make the container heavyweight and like a VM?

An init system does not have to be heavyweight. You may be thinking about Upstart, Systemd, SysV init etc with all the implications that come with them. You may be thinking that full system needs to be booted inside the container. None of this is true. A “full init system” as we may call it, is neither necessary nor desirable.

The init system that I’m talking about is a small, simple program whose only responsibility is to spawn your application, and to reap adopted child processes. Using such a simple init system is completely in line with the Docker philosophy.

A simple init system

Is there already an existing piece of software that can run another application and that can reap adopted child processes at the same time?

There is almost a perfect solution that everybody has — it’s plain old bash. Bash reaps adopted child processes properly. Bash can run anything. So instead having this in your Dockerfile…

CMD ["/path-to-your-app"]

…you would be tempted to have this instead:

CMD ["/bin/bash", "-c", "set -e && /path-to-your-app"]

(The -e directive prevents bash from detecting the script as a simple command and exec()‘ing it directly.)

This would result in the following process hierarchy:

But unfortunately, this approach has a key problem. It doesn’t handle signals properly! Suppose that you use kill to send a SIGTERM signal to bash. Bash terminates, but does not send SIGTERM to its child processes!

When bash terminates, the kernel terminates the entire container with all processes inside. These processes are terminated uncleanly through the SIGKILL signal. SIGKILL cannot be trapped, so there is no way for processes to terminate cleanly. Suppose that the app you’re running is busy writing a file; the file could get corrupted if the app is terminated uncleanly in the middle of a write. Unclean terminations are bad. It’s almost like pulling the power plug from your server.

But why should you care whether the init process is terminated by SIGTERM? That’s because docker stop sends SIGTERM to the init process. “docker stop” should stop the container cleanly so that you can start it later with “docker start”.

Bash experts would now be tempted to write an EXIT handler that simply sends signals to child processes, like this:

Unfortunately, this does not solve the problem. Sending signals to child processes is not enough: the init process must also wait for child processes to terminate, before terminating itself. If the init process terminates prematurely then all children are terminated uncleanly by the kernel.

So clearly a more sophisticated solution is required, but a full init system like Upstart, Systemd and SysV init are overkill for lightweight Docker containers. Luckily, Baseimage-docker has a solution for this. We have written a custom, lightweight init system especially for use within Docker containers. For the lack of a better name, we call this program my_init, a 350 line Python program with minimal resource usage.

Several key features of my_init:

Reaps adopted child processes.

Executes subprocesses.

Waits until all subprocesses are terminated before terminating itself, but with a maximum timeout.

Logs activity to “docker logs”.

Will Docker solve this?

Ideally, the PID 1 problem is solved natively by Docker. It would be great if Docker supplies some builtin init system that properly reaps adopted child processes. But as of January 2015, we are not aware of any effort by the Docker team to address this. This is not a criticism — Docker is very ambitious, and I’m sure the Docker team has bigger things to worry about, such as further developing their orchestration tools. The PID 1 problem is very much solvable at the user level. So until Docker has officially solved this, we recommend people to solve this issue themselves, by using a proper init system that behaves as described above.

Is this really such a problem?

At this point, the problem might still sound hypothetical. If you’ve never seen any zombie processes in your container then you may be inclined to think that everything is all right. But the only way you can be sure that this problem never occurs, is when you have audited all your code, audited all your libraries’ code, and audited all the code of the libraries that your libraries depend on. Unless you’ve done that, there could be a piece of code somewhere that spawns processes in such a way that they become zombies later on.

You may be inclined to think, I’ve never seen it happen, so the chance is small. But Murphy’s law states that when things can go wrong, they will go wrong.

Apart from the fact that zombie processes hold kernel resources, zombie processes that don’t go away can also interfere with software that check for the existence of processes. For example, the Phusion Passenger application server manages processes. It restarts processes when they crash. Crash detection is implemented by parsing the output of ps, and by sending a 0 signal to the process ID. Zombie processes are displayed in ps and respond to the 0 signal, so Phusion Passenger thinks the process is still alive even though it has terminated.

And think about the trade off. To prevent problems with zombie processes from ever happening, all you have to do is to is to spend 5 minutes, either on using Baseimage-docker, or on importing our 350 lines my_init init system into your container. The memory and disk overhead is minimal: only a couple of MB on disk and in memory to prevent Murphy’s law.

Conclusion

Is Baseimage-docker the only possible solution? Of course not. What Baseimage-docker aims to do is:

To make people aware of several important caveats and pitfalls of Docker containers.

To provide pre-created solutions that others can use, so that people do not have to reinvent solutions for these issues.

This means that multiple solutions are possible, as long as they solve the issues that we describe. You are free to reimplement solutions in C, Go, Ruby or whatever. But why should you when we already have a perfectly fine solution?

Maybe you do not want to use Ubuntu as base image. Maybe you use CentOS. But that does not stop Baseimage-docker from being useful to you. For example, our passenger_rpm_automation project uses CentOS containers. We simply extracted Baseimage-docker’s my_init and imported it there.

So even if you do not use, or do not want to use Baseimage-docker, take a good look at the issues we describe, and think about what you can do to solve them.

If you liked this article then maybe you would be interested in our newsletter. It’s low volume, but we regularly post interested updates there. Just enter your email address and name. No spam, we promise.

We have recently been interviewed by the awesome guys at The Changelog, a weekly blog and podcast about open source projects. In this interview I explained a bit about Phusion’s history, what’s new in Phusion Passenger 5, how the Raptor campaign came to be, why we chose to run the campaign like that, the challenges of open source, future Passenger developments, etc.

Indeed, running an open source company has been a big challenge:

A lot of people say, “Hey, if you’re open source, and you want to make money, try selling support.” We tried doing that, but it didn’t work at all, because Passenger is too good.

It’s been a while since we released the first beta of Phusion Passenger 5 (codename “Raptor”), the application server for Ruby, Python and Node.js web apps. We have received a lot of great feedback from the community regarding its performance, stability and features.

Passenger 5 isn’t production-ready yet, but we are getting close because 5.0 beta 3 will soon be released. But in the mean time, we would like to share a major new idea with you.

While Passenger 5 introduced many performance optimizations and is much faster than Passenger 4, the impact on real-world application performance varies greatly per application. This is because in many cases the overall performance is more dependent on the application than on the app server.

It’s obvious that just making the app server itself fast is not enough to improve overall performance. So what else can the app server do? After contemplating this question for some time, we believe we have found an answer in the form of a modified HTTP caching mechanism. Its potential is huge.

Update: Over the course of the day, readers have made us aware that some of the functionality can also be achieved through Varnish and through the use of Edge Side Includes, but there are also some ideas which cannot be achieved using only Varnish. These ideas require support from the app. Please read this article until the end before drawing conclusions.

Please also note that the point of this article is not to show we can “beat” Varnish. The point is to share our ideas with the community, to have a discussion about these ideas and to explore the possibilities and feasibility.

Turbocaching

One of the main new features in Passenger 5 is turbocaching. This is an HTTP cache built directly in Passenger so that it can achieve much higher performance than external HTTP caches like Varnish (update: no, we’re not claiming to be better than Varnish). It is fast and small, specifically designed to handle large amounts of traffic to a limited number of end points. For that reason, we described it as a “CPU L1 cache for the web”.

Turbocaching is a major contributor of Passenger 5’s performance

The turbocache has the potential to improve app performance dramatically, no matter how much work the app does. This is seen in the chart above. A peculiar property is that the relative speedup is inversely proportional to the app’s native performance. That is, the slower your app is, the bigger the speedup multiplier you can get from caching. At worst, caching does not hurt. In extreme cases — if the app is really slow — you can see a hundred fold performance improvement.

The limits of caching

So far for the potential of caching, but reality is more nuanced. We have received a lot of feedback from the community about the Passenger 5 beta, including feedback about its turbocache.

As expected, the turbocache performs extremely well in applications that serve data that is publicly cacheable by everyone, i.e. they do not serve data that is login-specific. This includes blogs and other sites that consist mostly of static content. The Phusion Passenger website itself is also an example of a mostly static site. But needless to say, this still makes the turbocache’s usefulness rather limited. Most sites serve some login-specific data, even if it’s just a navigation bar displaying the username.

Even the CloudFlare CDN — which is essentially a geographically distributed HTTP cache — does not help a lot with logged-in traffic. Although CloudFlare can reduce the bandwidth between the origin server and the cache server through its Railgun technology, it doesn’t reduce the load on the origin server, which is what we are after.

Update: some readers have pointed out that Varnish supports Edge Side Include (ESI), which is like a text postprocessor at the web server/cache level. But using ESI only solves half of the problem. Read on for more information.

A glimmer of hope

Hope is not all lost though. We have identified two classes of apps for which there is hope:

Apps which have more anonymous traffic than logged in traffic. Examples of such apps include Ted.com, Wikipedia, Imgur, blogs, news sites, video sites, etc. Let’s call these mostly-anonymous apps. What if we can cache responses by user, so that anonymous users share a single cache entry?

Apps which serve public data for the most part. Examples of such apps include: Twitter, Reddit, Discourse, discussion forums. Let’s call these mostly-public apps. Most of the data that they serve is the same for everyone. There are only minor variations, e.g. the a navigation bar that displays the username, and secured pages. What if we can cache the cacheable content, and skip the rest?

Class 1: caching mostly-anonymous apps

There is almost a perfect solution for making apps in the first class cacheable: the HTTP Vary header. This header allows you to send a different cached response, based on the value of some header that is sent by the client.

For example, suppose that your app…

…serves gzip-compressed responses to browsers that support gzip compression.

…serves regular responses to browsers that don’t support gzip compression.

You don’t want a cache to serve gzipped responses to browsers that don’t support gzip. Browsers tell the server whether they support gzip by sending the Accept-Encoding: gzip header. If the application sets the Vary: Accept-Encoding header in its responses, then the cache will know that that particular response should only be served to clients with the particular Accept-Encoding value that it has received now.

The Vary response header makes HTTP caches serve different cached responses based on the headers the browsers send.

In theory, we would be able to cache responses differently based on cookies (Vary: Cookie). Each logged in user would get its own cached version. And because most traffic is anonymous, all anonymous users can share cache entries.

Unfortunately, on the modern web, cookies are not only set by the main site, but also by third-party services which the site uses. This includes Google Analytics, Youtube and Twitter share buttons. The values of their cookies can change very often and often differ on a per-user basis, probably for the purpose of user tracking. Because these widely different values are also included in the cache variation key, they make it impossible for anonymous users to share cache entries if we were to try to vary the cache by the Cookie header. The situation is so bad that Varnish has decided not to cache any requests containing cookies by default.

Even using Edge Side Include doesn’t seem to help here. The value of the cookie header can change quickly even for the same user, so when using Edge Side Include the cache may not even be able to cache the previous user-specific response.

The eureka moment: modifying Vary

While the Vary header is almost useless in practice, the idea of varying isn’t so bad. What we actually want is to vary the cache by user, not by the raw cookie value. What if the cache can parse cookies and vary the cached response by the value of a specific cookie, not the entire header?

And this is exactly what we are researching for Passenger 5 beta 3. Initial tests with a real-world application — the Discourse forum software — show promising results. Discourse is written in Ruby. We have modified Discourse to set a user_id cookie on login.

The result is a Discourse where all anonymous users share the same cache entry. Uncached, Discourse performance is pretty constant at 97 req/sec no matter which app server you use. But with turbocaching, performance is 19 000 req/sec.

This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done*. The benefit that turbocaching adds in this scenario is exactly in line with our vision of a “CPU L1 cache” for the web. You can still throw in Varnish for extra caching on top of Passenger’s turbocaching, but Passenger’s turbocaching’s provides an irreplaceable service.

* Maybe Varnish’s VCL allows it, but we have not been able to find a way so far. If we’re wrong, please let us know in the comments section.

Class 2: caching mostly-public apps

Apps that serve pages where most data is publicly cacheable, except for small fragments, appear not to be cacheable at the HTTP level at all. Currently these apps utilize caching at the application level, e.g. using Rails fragment caching or Redis. View rendering typically follows this sort of pseudo-algorithm:

However, this still means the request has to go through the application. If there is a way to cache this at the Passenger level then we can omit the entire application, boosting the performance even further.

We’ve come to the realization that this is possible, if we change the app into a “semi single page app”:

Instead of rendering pages on the server side, render them on the client side, e.g. using Ember. This way, the view templates can be simple static HTML files, which are easily HTTP cacheable.

PushState is then used to manipulate the location bar, making it feel like a regular server-side web app.

The templates are populated using JSON data from the server. We can categorize this JSON data in two categories:

User-independent JSON data, which is HTTP-level cacheable. For example, the list of subforums.

User-specific JSON data, which is not HTTP-level cacheable. For example, information about the logged in user, such as the username and profile information.

And here lies the trick: we only load this data once, when the user loads the page. When the user clicks on any links, instead of letting the browser navigate there, the Javascript loads the user-independent JSON data (which is easily cacheable), updates the views and updates the location bar using PushState.

By using this approach, we reduce the performance impact of non-cacheable fragments tremendously. Normally, non-cacheable page fragments would make every page uncacheable. But by using the approach we described, you would only pay the uncacheability penalty once, during the initial page load. Any further requests are fully cacheable.

And because of the use of HTML PushState, each page has a well-defined URL. This means that, despite the app being a semi-single-page app, it’s indexable by crawlers as long as they support Javascript. GoogleBot supports Javascript.

Discourse is a perfect example of an app that’s already architected this way. Discourse displays the typical “navigation bar with username”, but this is only populated on the first page load. When the user clicks on any of the links, Discourse queries JSON from the server and updates the views, but does not update the navbar username.

An alternative to this semi-single page app approach is by using Edge Side Include technology, but adoption of the technology is fairly low at this point. Most developers don’t run Varnish in their development environment. In any case, ESI doesn’t solve the whole problem: just half of it. Passenger’s cookie varying turbocaching feature is still necessary.

Even when there are some protected/secured subforums, the turbocache cookie varying feature is powerful enough make even this scenario cacheable. Suppose that the Discourse content depends on the user’s access level, and that there are 3 access levels: anonymous users, regular registered members, staff. You can put the access level in a cookie, and vary the cache by that:

That way, all users with the same access level share the same cache entry.

Due to time constraints we have not yet fully researched modifying Discourse this way, but that leads us to the following point.

Call for help: please participate in our research

The concepts we proposed in this blog post are ideas. Until tested in practice, they remain theory. This is why we are looking for people willing to participate in this research. We want to test these ideas in real-world applications, and we want to look for further ways to improve the turbocache’s usefulness.

Participation means:

Implementing the changes necessary to make your app turbo-cache friendly.

Benchmarking or testing whether performance has improved, and by how much.

Actively working with Phusion to test ideas and to look for further room for improvements. We will happily assist active participants should they need any help.

If you are interested, please send us an email at info@phusion.nl and let’s talk.

Also, if you liked this article then maybe you would be interested in our newsletter. It’s low volume, but we regularly post interested updates there. Just enter your email address and name. No spam, we promise.

Phusion Passenger 4 is the current stable branch, in which we release bug fixes from time to time. At the same time there is also Phusion Passenger 5, which is the not-yet-ready-for-production development branch, with major changes and improvements and terms of performance application behavior visibility. Version 5.0 beta 3 will soon be released, but until the 5.x branch is considered stable, we will keep releasing bug fixes under the 4.x branch.

Improved Ruby 2.2 support

Version 4.0.56 already introduced Ruby 2.2 support, but due to an issue in the way we compile the Phusion Passenger native extension, it didn’t work with all Ruby 2.2 installations. In particular, 4.0.56 worked with Ruby 2.2 installations that were compiled with a shared libruby, which is the case if you installed Ruby 2.2 with RVM or though operating system packages. But it did not work with Ruby 2.2 installations that were compiled with a static libruby, which is the case if you installed manually from source, or using rbenv and chruby, or when you are using Heroku.

At first, we suspected a bug in Ruby 2.2’s build system, but after feedback from the MRI core developers, it turned out to be an issue in our own build system. The issue is caused by a commit from 4 years ago, GH-168, which attempted to fix a different issue. It seems there is no way to fix Ruby 2.2 compatibility while at the same time fixing GH-168, so we had to make a choice. Since GH-168 is quite old and was made at a time when Ruby 1.8.6 was the latest Ruby version, we believe that the issue is no longer relevant. We reverted GH-168 in favor of Ruby 2.2 compatibility.

“Phusion” and “Phusion Passenger” are registered trademarks of Phusion. “Rails”, “Ruby on Rails” and the Rails logo are registered trademarks of David Heinemeier Hansson. All other trademarks are property of their respective owners.