In part one of this series, we introduced xhprof, while in part two we took an in-depth look at the Xhgui UI and now in our third and final installment, let’s put our xhprof/Xhgui knowledge to work!

Performance Tuning

The only great code is code that never has to run. Everything else is just good code. So, to start performance tuning, the best option is to first ensure that as little code is run as possible.

OpCode Caching

The first, quickest, and easiest option is to enable an opcode cache. More information on OpCode Caches can be found here.

Here we see what happens when enabling Zend OpCache, the first row is our baseline, without the cache enabled.

The middle row we see a small performance improvement, and a big reduction in memory usage. The small performance gain is (probably) from the optimizations performed by Zend OpCache in addition to the opcode caching.

The top row is then with both optimizations and the opcode caching, where we see a large performance gain.

Now we take a look at APC before and after. Contrast to Zend OpCache we see a performance degradation on the initial (middle) request as the cache is built, both in time taken and the memory used.

We then see a similar performance increase with the cached opcodes.

Content Caching

The second thing we can do is to cache content — this is an easy option for WordPress with a number of easy to install plugins to do this for us, including WP Super Cache. WP Super Cache will create static versions of your site that expire automatically when things like comments occur depending on your setup (in very high-load situations for example, you may want to disable cache expiration for any reason).

Content caching only works effectively when there are few writes operations (that invalidate cache) compared to reads.

You should also be caching content received by the application from third party APIs, thereby limiting both your latency and reliance on the availability of the API.

Query Caching

Another caching option is query caching. For MySQL there is a general query cache which can help tremendously; for other databases, and in addition to the query cache for MySQL, caching result sets in a memory cache like memcached, or cassandra can prove very beneficial.

As with content caching, it is most effective in read-heavy scenarios. The MySQL query cache in particular should not be relied upon for performance as it is easy to invalidate large segments of the cache with minor changes to data.

Query Caching may help performance when generating content caches.

As can be seen below, when we turn on query caching we see wall time reduced by 40%, although memory usage does not change in any meaningful way.

There are three types of caching options available, controlled by the query_cache_type setting.

Setting the value to 0 or OFF will disable the cache

Setting the value to 1 or ON will cache all selects except those that start with SELECT SQL_NO_CACHE.

Setting the value to 2 or DEMAND will only cache selects that start with SELECT SQL_CACHE

Also, you should set the query_cache_size to a non-zero value. Setting it to zero will disable the cache regardless of the query_cache_type setting.

The primary issue with the MySQL query cache is that it is global, any changes to a table that is part of a cached result set will cause that cache to be invalidated. On high-write applications, this can make the cache next to useless.

However, there are other options which you can use to build more intelligent caching based on your needs and knowledge of your data-set, such as memcached, riak, cassandra or redis.

Query Optimization

As noted earlier, database queries are a common cause of slow-downs, performing query optimization will likely have more low hanging fruit and give you more benefit than optimizing code.

Query optimization will help performance when generating content caches, as well as helping with the worst case scenario of not being able to cache.

In addition to profiling, there is another option for MySQL that can help identify slow queries, the slow query log. The slow query log will log any query that takes longer than a specified time, and optionally queries that don’t use indexes.

You can enable the log using the following configuration in your my.cnf.

Code Optimization

Typically only once you are to the point that you are no longer being hampered by PHP itself (by using an opcode cache), have cached as much content as possible, and optimized your queries, are you are now ready to start tweaking code.

Code and query optimization is also where you will need to have the most performance to create the other caches; the better performant your code is in the worst situation (no cache), the more stable your app will be, and the quicker it will be able to rebuild caches.

Lets walk through (potentially) optimizing our WordPress install.

First, let’s take a look at our slowest functions:

Surprisingly to me, the first item in our list is not MySQL (in fact mysql_query() is fourth), but is instead the apply_filters() function.

For those who don’t know the WordPress codebase, it features an event-based filter system to perform multiple transformations on data sequentially as added by both core and plugins as callbacks.

The apply_filters() function is where these callbacks are applied.

The first thing you might notice is that the function is called 4194 times. If we click on it to view further details, we can then order the “Parent functions” table by “Call Count” descending, to reveal there are 778 calls to apply_filter() from the translate() function.

This is interesting as I don’t actually use any translations given that I (and I suspect the majority of users) use the WordPress software in its native language: English.

So let’s take a further look at what this translate() function is doing by clicking through its details.

Here we see two interesting things, first we see that of the parents, one is called 773 times, __().

Looking at the source code for this function, we see that it is an wrapper for translate().

As a rule of thumb, function calls are expensive and should be avoided; given that we are always calling __() instead of translate(), we should instead flip the alias, so that translate() is the alias to preserve backwards compatibility, and __() doesn’t call an unnecessary function.

Realistically though, this change isn’t going to make much of a difference and is a micro-optimization — but it does improve code readability and simplify the call-graph.

Moving on from this, let’s look at the Child functions:

We are now getting into the meat of this function, we see that there are 3 functions/methods called, 778 times each:

get_translations_for_domain()

NOOP_Translations::translate()

apply_filters()

Ordering by inclusive wall time descending, we can see that apply_filters() is by far the most expensive call.

What this code is doing is retrieving a translation object, then passing the result of $translations->translate() into apply_filters(). We can tell from then list that $translations is an instance of the NOOP_Translations class.

Based on the name alone (NOOP), and confirmed by a helpful comment in the code, we see that our translator is in fact not doing anything!

<?php/**
* Provides the same interface as Translations, but doesn't do anything
*/classNOOP_Translations{?>

So perhaps we can avoid this code entirely!

With a little debugging on the code, we can see that we are using the default domain, so let’s change the code to ignore the translator when using this:

Next we profile again, be sure to run it at least twice this ensures that all caches have been built and it’s a fair comparison!

This run is now faster! But how much and why?

We can find this out by using the XHGui compare runs feature. Going back to our original run, we click the “Compare this run” button at the top-right and choose our new run from the list.

Doing this we can see that we have reduced the number of function calls by 3%, the Inclusive Wall time by 9%, and the Inclusive CPU time by 12%!

We can then order the details table by Call Count in ascending order, this confirms that (as expected) we see a reduction in calls to get_translations_for_domain() and NOOP_Translations::translate(). Just as importantly, we can confirm that no unexpected changes occurred.

A 9-12% performance improvement is great for about 30 minutes of work; this translates to real world performance gains, even after we’ve applied our opcache.

We can now repeat this procedure for other functions, and keep doing so until we find no more to optimize.

Note: This change was submitted to WordPress.org and has been updated. You can follow the discussion and see the process in practice on the WordPress Bug Tracker. It is scheduled for inclusion in WordPress 4.1.

Other Tools

In addition to the fantastic XHProf/XHGui, there are several other great tools available.

New Relic

New Relic provides both front and backend performance analytics; with phenomenal insights into the entire backend stack including SQL queries and code, as well as on the frontend with DOM and CSS rendering as well as Javascript.

uprofiler

uprofiler is an as-yet unreleased fork of Facebooks XHProf, with the intent of removing the CLA required by Facebook. At present they are identical in terms of feature-set, with just some renaming taking place.

XHProf.io

XHProf.io is an alternative UI for XHProf. XHProf.io uses MySQL for profile storage and is not as user friendly as XHGui.

Xdebug

Before XHProf came onto the scene, there was Xdebug — Xdebug is an active profiler, which means that it should never be used in production, but can offer some fantastic insights into your code.

However, it must be paired with another tool to read the profiler output, such as KCachegrind which can be very difficult to install on non-Linux machines, and the much easier alternative Webgrind.

Webgrind does not provide nearly the same feature-set as KCachegrind, but is a PHP web app and easy to install pretty much anywhere.

When paired with KCachegrind you can very easily explore and find performance issues (in fact, this is my preferred view of any profiling tool!)

In Closing

Profiling, and performance tuning are very complex subjects. With the right tools, and a good understanding of how to use them, we can examine and improve our code tremendously — even on codebases we don’t have much experience with.

It is definitely worthwhile taking the time to explore and learn these tools.

About Davey Shafik

Davey Shafik is a full time PHP Developer with 12 years experience in PHP and related technologies. A Community Engineer for Engine Yard, he has written three books (so far!), numerous articles and spoken at conferences the globe over.