Doctrine Performance Revisited

In our ever-lasting quest to provide a powerful, flexible and yet
performant ORM experience we are often confronted with benchmarks
and have been talking about performance topics since last year in
several talks at many different conferences, and Roman has
talked about his opinion on such benchmarks on this blog.

Recently Francois Zaninotto, lead developer of the soon to be
released Propel 1.5 (currently in beta)
wrote a blog post
comparing performance mainly of the different Propel 1.x versions
with and without caching and against a PDO benchmark. The benchmark
also contains a test for Doctrine 1.2.

It is important to note that the PDO test only shows the “baseline”
performance, that is, it does not even remotely “do the same thing”
as the others. No object creation, no hydration of objects from
result rows, no identity management, no change tracking, nothing.
So dont get the numbers wrong. If you would want to get at least
remotely the same result as the ORMs provide with a raw PDO/SQL
“benchmark”, you would need quite some custom coding and, if you
dont want to copy/paste all day, introduce some abstraction.

Scenario 5: Lookup a record and hydrate it together with its
related record in another table. Tests join hydration speed.

I reproduced the complete table of results here for comparison
since my machine is generating very different overall times than
the ones generated by Francois. Each Scenario is executed several
times and the sum of execution times is printed. After each run the
identity maps are wiped so that objects are not reused. All the
tests use an SQLite In-Memory database, are run on PHP 5.3 and
of course use an opcode cache (APC).

A first version of the corresponding Doctrine 2 benchmarks was
added today
to the SVN repository by Roman
They can all be run from your machine directly after checkout.

Doctrine 2 Insert Performance

This is mainly a result of the rather strange test. Its basically a
mass-insert. All the insert tests seem to use a single database
transaction, so its comparable to a mass-insert on a single
request. As such the result is not surprising since we know that
Doctrine 2 can effectively batch inserts. Mind you that
mass-inserts are not really a focus of an ORM and not a realistic
scenario in most applications. So take this test with a grain of
salt, its a mass-insert test. If you’re looking for the ORM with
the fastest mass-inserts, you can stop now, you found it.

Doctrine 2 Find By Primary Key Performance

Doctrine 2 Find Entity By Primary Key performance seems to be
roughly three times as slow as handcrafted PDO (that doesnt do
anything besides executing the query, mind you...). The good
results in this test, especially compared to Doctrine 1, come from
the fact that there is not much abstraction for all kinds of
find*() operations going on. SQL is created, executed and the
results turned into objects without much hoopla.

Doctrine 2 Complex Query Performance

The complex query is a scalar count query. See the Doctrine 2 code
for this scenario:

The getSingleScalarResult() method that executes the query uses a
very minimalistic hydration mode that only grabs the first value of
the first result column. Therefore in combination with the DQL to
SQL Query Parser Cache (Doctrine2WithCacheTestSuite) we get a
result almost as fast result as the PDO handcrafted scenario,
because we essentially get the transformed SQL query from the cache
for this DQL, execute it and grab the value.

Hydration Performance (Scenario 4 and 5)

In the field of hydration Doctrine 2 is either equally fast or
seems “only” up to 40% slower than Propel 1.4 or Propel 1.5 based
on the two scenarios. The main reason here is really only that
since Doctrine 2 provides transparent persistence, it can not give
lazy-loading through base classes, instead it needs to inject proxy
objects as stubs into the entities. That simply means Doctrine
needs to create more objects than propel, thats it. Note that once
the objects would actually be lazy-loaded, Propel would need to
create these objects, too. The difference is that Doctrine needs to
create them beforehand. When they lazy-load, no new object is
created, the proxies simply populate themselves with the data.

A main difference, however, is that the hydration code of Doctrine
is completely generic. That means this same code can handle all
kinds of different SQL results correctly, no matter how many nested
joins, scalar values, aggregate values there are in the result and
it can even deal with strangely ordered collections in result sets
(You get such stuff with multiple order by clauses on different
fields which order in different directions. Combine such ordering
with joining collections and you get a pretty funky SQL result
set).

The general approach in algorithms from the Doctrine 1.2 Hydrators
were re-used in Doctrine 2. However, optimizations in the data
structures and use of the fastest internal php methods (as fast as
you can get with php, you know ;)) made it possible to optimize the
code to yield the shown results.

Interesting here is maybe that Doctrine 2 without caching is all in
all still a lot faster than Doctrine 1 with caching, so this looks
like a good improvement. Furthermore, the query cache in Doctrine 2
is very effective and almost completely removes all the overhead of
DQL. The query cache is what allows us to provide this extremely
powerful abstraction that is immensely flexible. If you dont like
DQL yet, you should read up on domain-specific languages and object
query languages in particular. It’s a gem and cornerstone of this
project and if you dont like it we can’t help you.

Hydration with non Object Results

Putting aside the boring Propel comparisons, lets get to something
Doctrine-specific. Because we know that read performance is very
important and object instances are not necessary all the time,
Doctrine 2, just like Doctrine 1, provides many different levels of
abstraction in-between objects and raw PDO/SQL result sets that you
can go up and down as you wish.

The main two intermediate levels are array graphs and flat, scalar
result sets (which are still not the same as the raw SQL result
sets because type conversions and column name to field name
conversions still take place).

The first method “Without Proxies” still creates object instances,
however, it does not replace loose ends of the object graph with
lazy-load proxies. Be careful with such optimizations in practice
because partial objects can be fragile to work with. The important
point here is that different levels of optimization are there when
needed, before you need to finally drop all abstraction and deal
with PDO/SQL directly (which is not bad, you know, just often not
very convenient, flexible and/or robust against refactorings or
schema changes).

The Array Hydration (getArrayResult()) returns a nested array
structure that is comparable to an object graph. Most of the time
you can think of it as a performant read-only “view” of an object
graph. In the case of Books with Authors the result looks like:

These array graphs can be built from basically any query. Its
backed by roughly the same algorithm that allows the arbitrary
object hydration with indefinite joins and even scalar and
aggregate values in between.

In the case where your objects implement ArrayAccess, you can often
use object and array results interchangeably without the need to
update view code.

Conclusion

What that all means is mainly that if you have an application that
looks (almost) exactly like the benchmarking code used here, then
you (maybe) got some useful numbers to look at, otherwise ... not.

Apart from that we hope this convinces you that we’re not wasting
your CPU cycles on purpose. Doctrine 2 is a huge balancing act
between flexibility, features and performance and it worked out
well so far.