Hi, I'm Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using various languages, such as Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the Elixir in Action book. In this blog you can read about Elixir, Erlang, and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Observing low latency in Phoenix with wrk

2016-06-12

Recently there were a couple of questions on Elixir Forum about observed performance of a simple Phoenix based server (see here for example). People reported some unspectacular numbers, such as a throughput of only a few thousand requests per second and a latency in the area of a few tens of milliseconds.

While such results are decent, a simple server should be able to give us better numbers. In this post I’ll try to demonstrate how you can easily get some more promising results. I should immediately note that this is going to be a shallow experiment. I won’t go into deeper analysis, and I won’t deal with tuning of VM or OS parameters. Instead, I’ll just pick a few low-hanging fruits, and rig the load test by providing the input which gives me good numbers. The point of this post is to demonstrate that it’s fairly easy to get (near) sub-ms latencies with a decent throughput. Benching a more real-life like scenario is more useful, but also requires a larger effort.

Building the server

It’s not spectacular but it will serve the purpose. The server code will read and decode the body, then perform the computation, and produce an encoded JSON response. This makes the operation mostly CPU bound, so under load I expect to see CPU usage near 100%.

So let’s build the server. First, I’ll create a basic mix skeleton:

$ mix phoenix.new bench_phoenix --no-ecto --no-brunch --no-html

I don’t need ecto, brunch, or html support, since I’ll be exposing only a simple API interface.

Now I need to change some settings to make the server perform better. In prod.exs, I’ll increase the logger level to :warn:

config :logger, level: :warn

By default, the logger level is set to :info meaning that each request will be logged. This leads to a lot of logging under load, which will cause the Logger to start applying back pressure. Consequently, logging will become a bottleneck, and you can get crappy results. Therefore, when measuring, make sure to avoid logging all requests, either by increasing the logger level in prod, or by changing the log level of the request to :debug in your endpoint (with plug Plug.Logger, log: :debug).

Another thing I’ll change is the value of the max_keepalive Cowboy option. This number specifies the maximum number of requests that can be served on a single connection. The default value is 100, meaning that the test would have to open new connections frequently. Increasing this value to something large will allow the test to establish the connections only once and reuse them throughout the entire test. Here’s the relevant setting in prod.exs:

Notice that I have also hardcoded the port setting to 4000 so I don’t need to specify it through the environment.

I also need to tell Phoenix to start the server when the system starts:

config :bench_phoenix, BenchPhoenix.Endpoint, server: true

I plan to run the system as the OTP release. This is a recommended way of running Erlang in production, and it should give me better performance than iex -S mix. To make this work, I need to add exrm as a dependency:

defp deps do
[..., {:exrm, "~> 1.0"}]
end

Finally, I need to setup the load-test script. I’ll be using the wrk tool, so I’ll create the wrk.lua script:

The parameters here are rigged to make the results attractive. I’m using as few connections as needed (the number was chosen after a couple of trial runs) to get close to the server’s max capacity. Adding more connections would cause the test to issue more work than the server can cope with, so consequently the latency would suffer. If you’re running the test on your own machine, you might need to tweak these numbers a bit to get the best results.

I’ve observed a throughput of ~ 24k requests/sec, with 99th percentile latency below 1ms, and the maximum observed latency at 3.05ms. I also started htop and confirmed that all cores were near 100% usage, meaning the system was operating near its capacity.

For good measure, I also ran a 5 minute test, to verify that the results are consistent:

Looking at htop, I observed that CPU is fully maxed out, so the system is completely using all the available hardware and operating at its max capacity. Reported latencies are quite larger now, since we’re issuing more work than the system can handle on the given machine.

Assuming the code is optimized, the solution could be to scale up and put the system on a more powerful machine, which should restore the latency. I don’t have such machine available, so I wasn’t able to prove it.

It’s also worth considering guarding the system against overloads by making it refuse more work than it can handle. Although that doesn’t seem like a perfect solution, it can allow the system to operate within its limits and thus maintain the latency within bounds. This approach would make sense if you have some fix upper bound on the acceptable latency. Accepting requests which can’t be served within the given time frame doesn’t make much sense, so it’s better to refuse them upfront.

Conclusion

I’d like to stress again that this was a pretty shallow test. The main purpose was to prove that we can get some nice latency numbers with a fairly small amount of effort. The results look promising, especially since they were obtained on my personal box, where both the load tester and the server were running, as well as other applications (mail client, browser, editor, …).

However, don’t be tempted to jump to conclusions too quickly. A more exhaustive test would require a dedicated server, tuning of OS parameters, and playing with the emulator flags such as +K and +s. It’s also worth pointing out that synthetic tests can easily be misleading, so be sure to construct an example which resembles the real use case you’re trying to solve.

Where is the comments section?

Until I figure out the GDPR implications, the comments are disabled. In the meantime, if you have some questions, you can find me at the Elixir Forum. Either tag me in a public post, or send me a DM, and I'll do my best to respond in a timely manner :-)