Let me begin by saying that I have used and continue to use various ORM frameworks such as hibernate, ibatis, propel and activerecord in applications and websites that have a user base ranging from a couple hundred to 500k users. Especially for projects that have to be up and running in a short duration of time, ORM frameworks significantly reduce the effort required to manipulate and persist OOP objects by providing time saving facilities such as automatically generated model objects, integrated unit testing, secure variable substitution, etc. Hibernate even supports horizontal data partitioning via Hibernate Shards.

However, the lay of the land is significantly different in the rarefied space occupied by applications needing to support millions of users. Profiling an application at this level and paying particular attention to the operations needed to move data to and from the database, it becomes evident that a significant portion of the operations are API related, whereby the ORM framework is traversing the abstraction layer built between the application logic and the native methods that ultimately interact with the database. I see a couple of problems with this level of abstraction and for the purpose of this discussion, I will purposely ignore caching for the sake of keeping the scope succinct.

1. The process of optimizing database queries is as much an art as it is a science and I am yet to see an ORM framework that does this well. In the case of mysql, optimization involves using facilities such as explain, benchmark, analyze table, show index, and the slow queries log to identify non-performing queries and tweak them to extract the leanest performance. These optimizations necessarily work best when applied as close as possible to the bare metal, so to speak, and the abstraction of an ORM framework negates to an extent the benefits of optimization. The devil remains in the details and the further away you are from the details, the lesser a chance you have to find and square with the devil.

2. At the end of the day, an ORM framework is essentially middleware. My reading of some of the real life architectures presented on this sites seems to reinforce the assessment that middleware will only take you so far, beyond which you have to roll your own. This makes perfect sense. ORM frameworks are built to serve as wide an audience as possible and while their success is unquestionable in the commodity/middle market, they are not and cannot possibly be tooled to accommodate the atypical demands of high scalability architecture. That would be akin to running with hares and hunting with the hounds. Building a framework for hight scalability would also require that the builders have a front and center seat in an enterprise where they are exposed to the machinery and day to day operations of a high scalability site. A situation for which you would be hard pressed to find another installation bearing similar characteristics or with similar requirements. Additionally, and without putting down the developers who contribute to these frameworks, a majority of them would not have the exposure to a bona fide high scalability architecture to be able to bring their experience to bear on the framework code base.

3. Just as with kernel developers, I have a significant amount of faith in the folks that spend their every waking hour coding database engines such as MySQL, Postgres, Oracle, MS SQL etc. Consequently, when the main goal is ultimate performance and scalability, I generally frown upon efforts to introduce a middle man between the wicked fast database and the application logic. And having invested the time and effort over many years to learn the intricacies of a database engine, I am more apt to cast my lot with the devil that I know than abdicate control to a framework, however versatile.

One could argue that it makes sense to start off with an ORM framework and as the demands for the site begin to eclipse what the framework can provide, gradually transition to a custom built solution. In my experience, refactoring on the database tier for a site that has a significant amount of data and needs to be operational 24x7 is pure hell. So much so that a more feasible option would be to build a parallel site then migrate and switch over. Of course this could be mitigated by using a service oriented architecture and thereby giving yourself some degree of maneuverability, but at the end of the day, there will be thousands of operations trying to read and write to the db every second. You are had, whichever which way you turn.

Taking a look at the mediawiki source code that powers the Wikimedia sites including Wikipedia, there are two classes, DatabaseMySQL and DatabasePostgress which encapsulate the native PHP functions that talk to MySQL or PostgreSQL respectively. The other main classes such as the Article class then use these database classes to interact with the db. Simple and straight forward and in my opinion, the best way to get maximum performance and throughput.

Reader Comments (2)

I used to use custom queries previously, but recently I've converted to using Symfony to develop all my sites.

In my experience, only a few queries on a site are actually responsible for most of the database locking and performance loss. The ORM is good for simple queries and simple joins which occur most of the time. However if you're trying to retreive a large dataset (more than 1000 items at once) then the ORM will result in bad performance if used naively.

For instance, most ORM lazy load data-relations. So if you collect records from table A and need a join to table B, you'll need to tell the ORM to perform a join when it gets the records or else you will see an extra DB call everytime you request an $A->B data call.

In that case, you can remain within the ORM layer and still optimize. In other cases (like for complex one-off subqueries) it might be better to ditch the ORM and query the database directly.

Symfony and Propel working together let you monitor all the database queries that are used on each page, and how long they took. That kind of logging you'd have to roll yourself if you used your own system.

Also, the ORM basically acts as the base for the Model in MVC. You can write all of your business logic into the Model. Validation and complex transformations from what's in the DB to what the client needs, can be isolated to the objects the ORM spits out.

> My reading of some of the real life architectures presented on this sites seems to reinforce> the assessment that middleware will only take you so far, beyond which you have to roll > your own.

One powerful approach that is the best of both worlds IMHO is code generation from your own specification. You can create code that fits snugly into your framework, does exactly what you want it to do, removes the tedious bits, and allows you to optimize exactly what needs optimizing.