Main navigation

Breadcrumb

Benchmarking magic

Submitted by Larry
on 4 November 2007 - 12:38am

The day is nearly upon us! Drupal 7 will open up developers to PHP 5 functionality when it is released next year. Already, there is talk of how, and if, to leverage PHP 5's object handling now that we don't need to deal with the weirdness of PHP 4's object model. Of course, because it's Drupal, our army of performance czars want to know just what the cost is for object handling, and especially advanced object magic like __get(), __call(), the ArrayAccess interface, and so forth.

So let's find out. :-)

Benchmarking methodology

The exact numbers in the following tests aren't particularly interesting. What's interesting is their relative value. All tests are run in a single script (available at the bottom of this post) on the following system:

Because it's a fairly beefy system, all tests are run 2,000,000 times so that we have worthwile numbers to compare. All times listed below are in seconds. Of course, any such tests will vary a bit between runs, and even between two tests in the same script. We're looking for overall trends here, not exact numbers, but it's important to keep in mind that micro-benchmarks are an inexact science. Also keep in mind that I don't know the internals of the PHP engine well at all, so my analysis is based on logical extrapolation, not actual knowledge of the PHP engine itself.

As others have noted before, call_user_func*() is extremely slow. Unfortunately, it's also the main way to do function-level polymorphism in PHP.

Methods

Moving on the object oriented-code, our main interest here, let's look at three different ways of calling methods: Directly, via __call(), and via __call() with a generic pass-through using call_user_func_array():

It looks like __call() is indeed not a speed deamon, but not as slow as I previously thought. Rather, it is call_user_func_array() that was the real killer. Between the two of them, call_user_func_array() has more overhead than __call() does. Mixing them is a performance nightmare.

Properties

PHP 5 also includes some magic overrides for properties: __get() and __set(). Let's see how expensive those are.

So we can determine 3 things here. One, setting a variable is a bit more expensive than reading it, but not enormously so. That's not surprising. Two, arrays are very slightly faster than objects for just reading a public property directly, but again not by much and probably not enough to worry about (especially when there are plenty of more expensive operations, as we are finding). Three, the ArrayAccess interface eats your CPU.

At first that seems surprising, but consider that each array access must first detect that it's using the extra language magic, then call a method, and in our case that method is not just trivially returning as it did in the earlier tests. It's doing an array lookup and returning an actual value. Still, a 4.6x-5x increase in time feels high. It's even a bit more expensive than __get() and __set().

Inheritance

What about simple inheritance? There are many ways to do polymorphism. So far we've determined that call_user_func_array() is a really lousy one from a performance perspective, and a wrapping function is going to cost an extra function call each time. What if we use inheritance for more traditional, "classic" polymorphism? Let's have a go.

Finally, some good news! Not really surprising news, either. When all is said and done, inheritance is basically free as far as CPU cycles go. That should not come as a surprise. Properties and methods of an object are inherited at creation time, not call time, so once the object is created it doesn't really matter how it was created. At least from a performance perspective, then, inheritance is not a concern.

Composition

Of course, the blogosphere has been hopping recently about how inheritance is evil and inflexible and composition is so much better and more flexible. The catch, though, is that composition does incur a run-time cost in terms of extra method calls. Let's see what that cost is.

Wrapping a method via composition roughly doubles the performance cost, which is exactly what we'd expect from adding one more method call to the stack. No surprises here, either. Consider that the cost of composition. At least it's cheaper than call_user_func_array(). :-)

Iterators

The last test we'll make involves iterators. SPL includes a huge collection of iterators, but we're only going to look at two of them. We'll compare iterating over a native array with an object that uses an internal iterator, using the Iterator interface, and one using an external Iterator via IteratorAggregate and ArrayIterator.

Oh dear god make it stop! A trivially-simple internal iterator has a performance hit of more than an order of magnitude over a native array. An external iterator is cheaper, but still not cheap.

Let's consider why that is, though. Using the Iterator interface, we're forcing PHP to call into user-space 2-3 times per iteration. (I'm not sure of the exact internals, but at minimum it would need to call next() and valid() each iteration, plus key() if we're requesting it.) That's three method calls per iteration, not counting the behind-the-scenes engine code to make the magic work. Maybe it's not so surprising then. The external iterator is faster here because we're using the ArrayIterator object, provided by SPL and implemented entirely in C. If we used a user-space external iterator, I would expect results similar to those for the internal iterator.

The moral of the story here is, as always, C is faster than PHP. The more you can do in C, the faster your code will be. (Hey, that rhymes!) If possible, use IteratorAggregate and ArrayIterator over an internal iterator. If that's not possible for some reason, say you're iterating over some external resource like a file handle or database result set, be aware that it's going to cost you.

Summary

So what have we learned? We've learned that there is no such thing as a free lunch, unless you're getting it from your parents. (Amazing how programming parallels real life, isn't it?) All of PHP 5's advanced object-oriented features have a cost, and sometimes that cost is non-trivial.

Does that mean we should avoid using them? Of course not! Magic methods, iterators, ArrayAccess, and the like make solving certain types of problems far easier and faster for the programmer. In many cases, throwing more CPU at the code is cheaper than writing more, clunkier, harder-to-maintain code. And if the advanced features are not used in critical sections of the program, you may not even notice the difference. These benchmarks should be used as guidelines only; moving your database server from the same computer to a dedicated database box will likely yield a bigger performance boost than expunging all traces of __get() from your code, and will almost certainly cost far less to do.

There's one other important observation that we haven't really mentioned. One of the big complaints about PHP 4's object model was that it was dog slow compared to procedural code. Well, whatever the truth to that it is no longer the case. Calling a function and calling a method is virtually identical in cost, at least under modern versions of PHP. Polymorphic code can eve be faster if using inheritance over composition or function-level composition (or god-forbid call_user_func_array()), although as always beware the inheritance trap. As with anything else, use wisely.

The raw data is available below, as is a graph of the results courtesy of OpenOffice.org Calc. I've left the internal iterator out of the graph as it would visually throw everything else off. The complete benchmark script used is available below as well.

Let me second Wez's comments and take it a bit further. Create the code you want without regard to speed, then profile the code to figure out where it can be made faster. If you spend all of your time "optimizing" your code by finding work arounds to __get/__set instead of just using them, by the time you get to the end you may have no more time or inclination to go in and profile the code which might turn out that your "non-trivial" performance drainers only account for < 5% of the total code execution.

Of course, since you're using solid test driven principles, you'll have the full safety net to refactor with impunity. ;-)

or at least potentially better programming. Larry Garfield is starting a conversation about the speeds of specific functions– an important contribution for something that has to be as optimized as Drupal.

So how do we start getting rid of "call_user_function" ;-) ?

Of course, my understanding is that the first order of optimization is avoiding a call to the database wherever possible– as a general principle?

The pros and cons of an OOP codebase are all well-established and discussed to death. The performance profile of Drupal core is something that gets a LOT of attention, and any conversion of any portion of it to OOP must receive this kind of attention. The approaches we take when re-architecting any procedural code will be influenced by the speed implications. Imagine, for example, converting the current hook system to something that uses __call(), internal iterators, and caps things off with call_user_func_array(). ;) While the code may be elegant (hypothetically speaking), the performance would be nightmarish.

Larry, this is awesome work. Thanks for helping map out the landmines in advance.

It's very easy to get hung up on call_user_func being slow compared to other invocation mechanisms, but how significant is that to the Drupal runtime compared to, say, the SQL being issued on a typical page load?

It's important to understand what's fast, but it's more important to realize that speed is relative; function invocation is almost certainly lightning fast compared to SQL or disk IO, and this is where profiling comes in. Find out what is significant using a profiler, then come up with alternative approaches and benchmark those, and fold back in, then repeat.

Hi Wez! Definitely profiling to determine what the slowest parts of your system are is important. Micro-optimizations usually don't yield huge performance benefits. However, knowing their general cost is important, too. For instance, do you really want to base a critical section of code that runs 100x more than any other routine on a user-space iterator? Probably not, if you can help it.

Similarly, if you can reduce the number of SQL calls you make by leveraging ArrayAccess as part of an abstraction layer, it's good to be aware that you're not eliminating time, you're just moving it elsewhere. That may still end up being a big win, but it's not a free win. It's important to know going in what the trade-off costs are, so you can decide if a certain refactoring is going to be worthwhile.

Right now, actually, Drupal's biggest time sink is simply loading and parsing code. (So says both the profiling I've done and the profiling Rasmus has done for us, because he's cool.) That's an area that should get better in Drupal 6, and I have plans to reduce that even more in Drupal 7. We'll get there. :-)

MySQL queries can easily be profiled with the built-in query logger, with MySQL 5.1.21+ (or a patched 5.0) even with microseconds granularity.

Just run the application as normal and filter the Slow Query Log afterwards with an easy though customizable script:
php mysql_filter_slow_log.php --exclude-user --no-duplicates linux-slow.log > mysql-slow-queries.log

Activate the Slow Query Log in my.ini:
log-slow-queries
# Log only those queries that run 0.3s or more (in microseconds)
long_query_time=300000
# Additionally log all queries which do not use indexes
log-queries-not-using-indexes
# MySQL 5.1.6 through 5.1.20 had a default value of log-output=TABLE, so you should force
log-output=FILE

Hi, this is a great post. However I want to point out that your Inheritance example is actually not very meaningful as it plainly override the whole parent method class. Most of the time in PHP OOP development we used to extend the parent class method in the Child by calling parent::originalMethod() in the child's function. If the parent actually do something instead of return; you would find out that it's slower than composition.
Cheers.

For these tests I was specifically looking just at the language overhead of the operation itself. If you call parent::originalMethod(), then your time will increase by the cost of one method call. If you do anything in the child or parent method, it will increase by that amount. In that test I was looking only at the lookup table cost of the override itself, not the myriad ways that it could be used in practice.

Nice and detailled article. We are thinking about using a kind of lazy loading mechanism in the database->OO mapper. Espacially for 1-N and N-N lazy loading can be done using an object that implements arrayAccess / interator interface and only queries the database relation when it is actually requested in the code. It would be nice to see some benchmarks on using arrayAccess / iterators (lazy fetching) vs. always query (eager fetching)

I've been working on a lazy-load magic-function based OO layer as well. That's one of the things that prompted me to look into just how expensive magic is. One of these days I need to get it published. :-)

The tricky part with a lazy-load database layer, though, is avoiding the SELECT N+1 trap. It's very easy to set yourself up to run 500 nearly-identical queries, and even prepared statements won't help you out of that one, performance-wise.

Benchmarking lazy vs. eager fetching is tricky, because there are many ways of doing both. Some versions of each are guaranteed to suck. :-)

Schema API may be an excellent place to keep notational information on lazy-loading. For example, relationships that authors *know* will be used infrequently can be marked as lazy-loaded, while others can be undetermined.

i am new to php's magic methods and this article a little bit comlicated for me. but it seem very helpfull. i'll read some starter article first and then read this one again. By the way thanks for your great effort.

Since we already make heavily use of the magic methods of PHP5 this deflating results give us a hint not to use this coding style in performance critical parts of our applications.
Thx for all the stats!! :-)

I am involved in a project, which developed its own active_record implementation. Since we expect that the project will be subject of some really heavy duty server load, we are looking for way to optimize it and boost its performance. Now, reading your article (which is really great), got me thinking about whether using SPL on our active record is a good thing. The most frequently used interface is Iterator, and judging from the benchmarks you posted it seems like using it is a step in the wrong direction. I checked (briefly) the active record implementation of CakePHP (the model.php5 one), Zend Framework (Zend_Db_Select) and Solar (Solar_Sql_Model_Record), and they are not using Iterator too. What will be your advice - ditch it to gain performance boost, right ?

On the benchmark for the internal iterator, you are probably making the iterator do more work than it needs to do. Specifically, you are just calling the next(), key(), reset(), etc. on each item in an array. The idea with implementing your own iterator is that presumably *you* are going to be managing that.

And instead of next using next($this-&gt;a), it should be something like ++$this-&gt;index.

current(), then, should return $this-&gt;a[$this-&gt;index], etc.

Not sure this will help a lot performance-wise. In fact, this may actually hurt performance. But it is more in-line with standard usage of the Iterator interface. And it should get rid of the overhead of calling another set of functions to do the work for you.

(For a Dictionary/Map data structure, where key() is likely to return something other than an int, the exercise changes, and the code depends on the type of data structure you are attempting to implement.)

Finally, might it be worth suggesting that built-in iterators be extended where possible? Say there is a situation where we need to loop through a set of key/value pairs. We have the keys, but the values have to be constructed. One might consider passing an array of keys to an extended ArrayIterator (e.g. MyArrayIterator extends ArrayIterator) and only overriding the current() method to get the value from another data source. Then you only incur one userland hit -- the call to current() -- instead of multiple hits.

There are several advantages to doing this. More details can be hidden from users of your API. Loading of objects can be deferred until they're needed (e.g. during iteration -- and depending on your looping structure, sometimes not at all). And you still get many of the speed advantages (cough) of using the straight ArrayIterator.