erlang

Over on his blog, Roberto Ostinelli published “A comparison between Misultin, Mochiweb, Cowboy, NodeJS and Tornadoweb.” I was going to write a reply comment there, but it got pretty long so I decided to publish it here instead. I’m going to ignore the non-Erlang web servers it discusses and focus entirely on Erlang. I’m not trying to really pick specifically on Roberto here, but rather I decided to finally write something I’ve been meaning to write for awhile now about Erlang web servers and benchmarking.

First, I second the request made by one commenter for including Yaws in the measurements. Roberto, if you need help with the code or setup, just let me know. If one insists on writing these kinds of benchmarks, which as you’ll learn if you read this whole entry is something I question, the least he or she could do is include Yaws since it’s the granddaddy of all Erlang web servers.

Based on the benchmark code Roberto published, I wrote the following simple Yaws module to conform to the problem statement and registered it in my Yaws configuration as a “/” appmod:

I then measured it on my Ubuntu 10.10 two-core system using Roberto’s published httperf command against the misultin and Mochiweb code he published, and found that Yaws definitely holds its own, even though it’s a full-featured web server and does not claim to be just a lightweight library offering (sometimes partial) HTTP support as some frameworks do. For some tests Yaws outperforms misultin, and for others it doesn’t. This is interesting, considering that neither Klacke nor I have made any attempts at performance improvements in Yaws recently.

Second, the benchmarks do not compare apples to apples. Both Mochiweb and Yaws, for example, produce replies that are larger in size than misultin’s replies, primarily because they both include Server and Date headers. As I’ve learned from years of helping maintain Yaws, date calculations can noticeably and surprisingly impact Erlang web server performance, yet simply leaving Date headers out isn’t an option for real-world apps since HTTP 1.1 pretty much requires them (section 13.2.3 of RFC 2616 states, “HTTP/1.1 requires origin servers to send a Date header, if possible, with every response, giving the time at which the response was generated…”). Caches use Date headers for several reasons, for example in the absence of cache-control headers to help heuristically calculate content expiration. Even ignoring the date calculation requirements, just creating and delivering larger replies due to the presence of the Server header will negatively impact any comparisons based on request/second measurements.

Third, the benchmarking approach includes no application “think time.” How many real-world apps just blast request after request down a connection without any intervening time to handle replies? If the goal is to measure something akin to real-world apps, then the benchmarks should at least be using something like httperf’s --wsess option to simulate client think time. And unfortunately doing that is hard to get right for generic benchmarks, since different client apps will have different think times.

On a related note, what exactly is the goal of these benchmarks? To imply that faster is better? That’s unfortunately a commonly-held fallacy. Given that the blog entry states that the target is dynamic applications, then consider the fact that the performance of a real-world dynamic application is often dominated by something other than the web server — perhaps some back-end service from which page data is being fetched, for example. A real-world setup greatly concerned with performance is likely to have nginx out in front, probably with a local cache, to handle fast-path requests, shunting only those requests it can’t fulfill off to the slower back-end server. Such benchmarking games are therefore often misguided as far as real-world dynamic apps are concerned because they end up measuring something that isn’t even in the critical path in a real setup.

I don’t agree with Kyle Drake’s comment on Roberto’s blog about code ugliness, since the Erlang code posted there is very clear and would look like “garbage” only to someone who doesn’t know the language. But I do agree with the sentiment, which is that for dynamic apps, what often matters is what kind of code, and how much, you have to write and maintain to support your app. Given that Erlang web servers tend to make use of the underlying Erlang/OTP facilities for HTTP parsing and socket handling, then all things considered you’re just not going to get a huge variation in performance among them, assuming they’re written halfway decently. What matters for dynamic apps are the stability of the web server/library and the programming model it offers. These are what Roberto should really be benchmarking, but of course that’s basically impossible since stability would take a long time to prove, and programming model is a matter of taste that can’t be conveniently measured using artificial benchmarking tools. This reminds me of one of my old columns on this very issue as applied to enterprise middleware, entitled “The Performance Presumption” (PDF); the short version is that people often measure performance simply because performance is relatively easy to measure. The lesson is that you shouldn’t rely on generic benchmarks, but rather you should take the time to create specific benchmarks that mimic the app you want to develop, and base your decisions on the results of that exercise.

On top of all that, I don’t really understand the desire to keep writing new Erlang web frameworks for performance reasons. As I stated earlier, if a framework uses Erlang’s built-in packet decoding and socket handling, it won’t perform a great deal better than any other Erlang web framework. OTOH, if someone writes a new framework with the hope of providing a really nice new programming model — webmachine is a fantastic example of this — then they shouldn’t be “proving” how good the programming model is by trying to show how fast it is. Ever seen webmachine being advertised via performance benchmarks? Neither have I.

Let’s face it, the Erlang web development community isn’t large enough to support numerous web servers and frameworks. I’m sure some will disagree, but publishing artificial benchmarks designed to “prove” which is best IMO results mostly in just fragmenting the community. If you really have an itch to write a fast Erlang web server, you’d help the community much more by contributing to an existing one, including the Erlang inets web server included in Erlang/OTP and now powering the Erlang website. For Yaws, Klacke and I often take patches and suggestions from our users, and we gladly welcome solid contributions intended to improve Yaws performance. If you’re just dying to show off your chops, note that improving performance in a long-lived and highly stable codebase like Yaws without breaking anyone’s code is far more challenging than writing another new server that basically doesn’t differ much from what already exists.

Or perhaps better yet, contribute to the Erlang core. IMO the next major performance improvements in Erlang web servers will come not from minor tweaks in handling binaries or such things, but rather via radical improvements in the Erlang TCP driver or even from developing a whole new HTTP-specific driver. Unlike a war of artificial benchmarks among Erlang web servers, these approaches have a great chance to improve the lot of all Erlang web systems.

In my recent QCon talk I talked about accidentally crashing an Erlang process on a customer’s subscription streaming video website running live in production. The code involved had not been used in production before, and the customer had decided somewhat unexpectedly to turn on a new feature that required it. The developer who wrote it had not tested it and had long since left the company.

The purpose of the code was to monitor bandwidth and session usage for each video subscriber to make sure they weren’t streaming more than they’d paid for. Concerned about the viability of the code, a colleague and I logged into the customer site (with their permission, of course), chose a subscriber at random, and, in an Erlang shell, I interactively invoked a function in the code in question to check that subscriber’s current bandwidth and session count. After a second check, we saw the numbers dropping, potentially indicating the subscriber was logging out, and we wanted to make sure all went well when the subscriber completely stopped streaming. After waiting a bit, I interactively called the function again, and — BAM! — the process holding session state for all paying customers crashed.

The original developer had used an Erlang ets table, an in-memory data store, to hold the subscriber data, and wrote something like this for lookups:

[SubscriberData] = ets:lookup(Table, Subscriber),

My interactive call from the shell looked up a nonexistent subscriber, so the result was the empty list [] rather than [SubscriberData], which caused a pattern mismatch and a badmatch exception. Uncaught, the exception crashed the process. Since the process owned the ets table, when it went down it took the ets table and all subscriber session data with it. It wasn’t so bad, since all it meant was that for a few hours a few subscribers potentially got a bit more video than they’d paid for, but still, it’s not at all the kind of design Erlang’s “Let It Crash” philosophy actually encourages. Crashing a process when something unexpected occurs is perfectly fine, since coding defensively introduces problems of its own, but you can still avoid losing your ets tables like this relatively easily.

Name an Heir

When you create an ets table you can also name a process to inherit the table should the creating process die:

TableId = ets:new(my_table, [{heir, SomeOtherProcess, HeirData}]),

If the creating process dies, the process SomeOtherProcess will receive a message of the form

{'ETS-TRANSFER', TableId, OldOwner, HeirData}

where TableId is the table identifier returned from ets:new, OldOwner is the pid of the process that owned the table, and HeirData is the data provided with the heir option passed to ets:new. Once it receives this message, SomeOtherProcess owns the table.

Give It Away

Alternatively, you can create an ets table and then give it to some other process to keep it:

If the creating process dies, the process SomeOtherProcess will receive a message of the form

{'ETS-TRANSFER', TableId, OldOwner, GiftData}

where TableId is the table identifier returned from ets:new, OldOwner is the pid of the process that owned the table, and GiftData is the data provided in the ets:give_away call. Once it receives this message, SomeOtherProcess owns the table.

Table Manager

Instead of naming an heir or giving a table away, you can just have your Erlang supervisor process create a child process whose sole task is to own the table. This process creates the table as a named public table, thus allowing other processes to know its name and read/write it directly, with ets built-in concurrency protection dealing with any concurrency issues. Since the owner process does nothing more than create the table and then wait to be told to shut down, the likelihood of it crashing and taking the table with it is practically nil. The drawback here, though, is that the process actually using the table may have to coordinate with the owner process to ensure the table is available, and worse, it ends up using what is essentially a global variable — the table name — which can make code harder to read and maintain.

A Combination Approach

A nice way of managing ets tables, though, is to use a combination of the three previous techniques:

The Erlang supervisor creates a table manager process. Since all this process does is manage the table, the likelihood of it crashing is very low.

The table manager links itself to the table user process and traps exits, allowing it to receive an EXIT message if the table user process dies unexpectedly.

The table manager creates a table, names itself (self()) as the heir, and then gives it away to the table user process.

If the table user process dies, the table manager is informed of the process death and also inherits the table back.

Once it inherits the table, the table manager can then for example wait until the supervisor recreates the table user process, and then repeat the steps above to give the table to the new table user process. Other variations on this approach, like maybe a small pool of child process clones that cooperate to transfer the table between them in case of error, are of course also possible. Even though there are still process coordination issues here (but nothing difficult), I like this approach because it avoids global named tables and takes advantage of Erlang's supervision hierarchy.

The title of my QCon talk was "Let It Crash...Except When You Shouldn't." This scenario is an example of "when you shouldn't" — losing ets data due to a process crash is easily avoided.

In my “Functional Web”March/April 2011 column (PDF), I explore potential process bottleneck problems within Erlang web applications, specifically around the issue of process fan-in. I’ve seen developers create Erlang web applications that suffer from bottlenecks related to interprocess communication as explained in the column, yet as far as I’ve seen this issue is rarely if ever discussed among Erlang developers.

As always, all constructive feedback on the column is welcomed. Just post your comments here or email me.

I’ve now rewritten the functions in a new Erlang library application named erlsha2, available at github. The erlsha2 implementation uses Erlang NIFs, making it significantly faster than the original. The original Erlang implementations are still there too, but they’re automatically overridden by the NIF library when it loads.

Compared to the original module, the exported functions in this module have been renamed; they used to have names like hexdigest224 but they now have shorter names like sha224 to more closely match the Erlang crypto module. I also implemented the init/update/final function groups for each hash algorithm to allow data to be incrementally hashed, also to match the crypto module.