Symfony Benchmarks: Scaling PHP by adding CPU & RAM

In the previous article in this series we took a look at how different runtimes affect Symfony performance, by comparing PHP 5.6, HHVM 3.11 and PHP 7.0.1. The conclusion was that both HHVM and PHP 7 offer significant improvements in performance without adding server resources. In this article we'll look at how adding them affects performance.

The simplest way to improve processing performance is to make a single CPU ever faster, but diminishing increases in single core performance have been the norm for over a decade now. This is why hosting lingo focuses on number of CPU cores, rather than the clockspeed.

Luckily PHP is well prepared for scaling by adding CPU and other resources by nature. It is relatively straightforward to scale just by throwing more resources at it:

The shared-nothing architecture of PHP where each request is completely distinct and separate from any other request leads to infinite horizontal scalability in the language itself.
-- http://techpatterns.com/forums/about567.html

Adding multiple servers does add complexity, but adding CPU and RAM to a single virtual server is nowadays drop-dead simple. Doubling the available CPU won't make individual requests run twice as fast, but in theory will allow you to serve twice as many requests at the same time.

That is the theory, but let's see how that holds in practise with some tests.

The server is operating with PHP 7.0.1 on PHP-FPM and Nginx 1.9.9 for all of the tests. For added details I ran passes with different max_children values for PHP-FPM to see if, and how, that affects performance. As before tests are repeated three times and average values are reported.

Front Page without Symfony Proxy

1 CPU Core2 CPU Core4 CPU Core8 CPU Core

Average requests per CPU core (Concurrency of 10)

For unproxied full page results the results are as expected. Growing rather linearly with added CPU resources. A Maximum of 5 child PHP-FPM processes seems to be the best option until hitting 8 Cores, after which 10 child processes is the clear winner. Max 20 PHP-FPM children offers no advantage.

Performance peaks at 161 req/s with 8 CPU cores, 10 child processes and 50 concurrent users. Excluding concurrency of 1, per CPU core results range from 24 req/s at 1 Core to 19.625 req/s at 8 Cores. This illustrates that there is added overhead, other bottlenecks or room for configuration optimisations.

Front Page with Symfony Proxy

1 CPU Core2 CPU Core4 CPU Core8 CPU Core

Average requests per CPU core (Concurrency of 10)

Proxied page results are inline with high-level expectations. On a single and dual core setups the highest output is delivered at a concurrency of 50 requests, where with 4 and 8 cores this evens out. Max children value of 10 is the by average, though with 8 cores 20 children offer a slight advantage .

API with Symfony Proxy

1 CPU Core2 CPU Core4 CPU Core8 CPU Core

Average requests per CPU core (Concurrency of 10)

Results for proxied API calls remain largely consistent. Notably concurrency of 50 yields the highest results regardless of core count. For child processes 10 is the best overall value, with a curiously significant drop for 20 child processes with 8 cores. This likely indicates an underlying issue that now surfaces because of tiny transfer payload and short processing time.

A combination of 8 CPUs, 10 child processes and a concurrency of 50 takes the performance crown with 9758 req/s. Excluding concurrency of 1, per CPU core results range from 1364 req/s at with 1 core to 1032.125 req/s at 8 cores. Again a significant drop that displays that CPU scaling is not linear by default.

Scaling PHP by adding RAM

Adding processing capacity is quite straight forward to understand when your application is CPU bound with enough RAM at it's disposal. For memory-strapped environments it's expected that due to less swapping the speedup will be significant.

What about excess RAM? Can you have too much of it? Linux servers utilise memory efficiently, as unused memory is wasted memory. Let's see how our example application behaves when adding more RAM in the UpCloud environment.

The test starts from a mere 512 Megabytes to a total of 8 Gigabytes. All while keeping CPU Core count at 8 and PHP-FPM process count at 10.

Front Page without Symfony Proxy

With unproxied, CPU heavy page loads the 0.5 GB setup falls behind once concurrencies go higher. Likely starved of memory. After remaining rather stable for 1, 2 and 4 GB the numbers get a small, but noticeable boost at 8 GB for one reason or another.

Front Page with Symfony Proxy

With proxied page loads with a large payload the 0.5 GB setup stays closer to the setups with more generious amounts of RAM. Curiously the 8 GB setup consistently falls behind the 1, 2 and 4 GB counterparts. I'm speculating that this has something to do with the hosting environment architecture rather than the Linux server itself.

API without Symfony Proxy

Similar to the unproxied page loads, the unproxied API results clearly indicate that the 0.5 GB setup falls behind. And again the 1, 2 and 4 GB setups are very stable where as there is again a noticeable bump upwards at 8 GB.

API with Symfony Proxy

For the low processing, high throughput proxied API calls the results somewhat unexpected. The 0.5 GB setup falls somewhat behind on higher concurrencier, but for concurrency 10 results are rather strange. This could indicatye something in the underlying hosting architecture as the numbers even out again at fifty concurrent requests.

Conclusions

As expected, scaling CPU bound performance by adding provides the expected results. The results don't grow linearly, but are an easy way forward when the PHP runtime tweaks and application optimisations still have untried paths.

Increasing the number of PHP-FPM max_children value has only limited effect on performance. Tweaks can be done, but unless your number of children in below the number of CPUs, don't expect significant improvements with increases.

It is fine to get warnings such as this one in your logs occasionally:

Rather than trying to avoid them completely by increasing max_children setting. A greedy max_children can lead to PHP-FPM processes end up hogging all the available RAM and starting to kill processes, such as mysql daemon in this case:

Scaling PHP by adding more memory is more of a mixed bag. Obviously if you are unable to run your PHP applications with the balance of CPU, you'll need more. In our example case memory usage remains rather low and 1 GB seems enough according to the benchmarks.