planets

links

buttons

As I blogged recently we started working on CouchDB support on top of the Doctrine2 infrastructure back at FrOSCamp. For me its been a while since I have taken a leading role in the development of an OSS component. Ever since I left PEAR, I have mainly been helping out here and there and with php.net I was doing organizational stuff. So its kinda fun to be back at working on some OSS code with significant contributions in terms of code. There is still a lot of work ahead, so I wouldn't mind a few helping hands, But you may wonder why you should even bother. Like I was in #couchdb on freenode the other day and David asked a fairly legitimate question: "can you enlighten me as to why you'd need an ORM on top of native json object"? In this blog post I will try to explain why it makes sense to add a model based infrastructure underneath a NoSQL database.

To me the first advantage of using model classes managed by an ODM is that it ensures a centralization of the data structures for different pieces of information. This is one of the concerns I raised earlier with NoSQL: To avoid needless differences in stored content (isDeleted vs. removed etc.) the code needs to manage the schema. Plus you get convenience stuff like lazy loading as well as bulk loading of relations for an entire collection. Changing queries to use either approach is very easy and can also be done on the model level.

Model classes also allow you to do validation of the structures in the client. Obviously CouchDB for example has the ability to do server side validation. But with model classes you can move this to the client entirely or at the very least get rid of round trips for failed validation. Maybe we will eventually support some way to sync rules between the model classes and the CouchDB server, aka import and export of these rules in both directions.

Obviously the great advantage of NoSQL is that you can more easily change your schema. But what happens to the data already stored? Either you have to migrate it all at once or you use something that Jonathan and I came up with back when he started working on MongoDB support in Doctrine2: Eventual migration! The idea here is that instead of having to migrate old data to the new format, you just place rules into the model for how the data is migrated when read, so that when it is then stored its migrated to the new data structure.

Also just like with an ORM the advantage of such an ODM is that you can do some pretty neat performance optimizations at a higher level. Doctrine2 is basically a persistence manager forgoing the old ActiveRecord type approach of Doctrine1:

So the idea is that you mess around with your models and then once you are done you flush the changes to the database. Now this enables quite intelligent use of transactions. If available you can also make use of bulk change API's by simply introspecting the changes to be flushed and then deciding what approach to use. In MongoDB ODM Jonathan for example implemented in place updates which could be used during a flush() operation. Furthermore during the flush() operation you can trigger events that take into account the entire change set which can be useful for managing an audit log for example.

One concern with all of this is how much overhead this adds, not only in terms of raw speed, but also in terms of code one has to load up. Doctrine2 is actually split up into much smaller pieces than the old Doctrine1. The ORM basically uses a set of common classes, the DBAL and the ORM on top. The later two are not used in either the MongoDB or the CouchDB ODM. The main stuff in common that is used is the annotation reader. We might later see if we can extract some common code between the two ODM's. In terms of performance in Doctrine2 you basically use plain PHP objects without any requirement to inherit from a base class. Of course there is still some overhead with all this flexibility. The advantages of this flexibility include several performance optimization angles with much less code, so while you may loose some performance and you win can win a lot more.

I guess I share David's concerns. ;-) And I think we already talked about something like "JSON schema" when you were in Berlin earlier this year.

I still don't see a huge problem, but I guess migrating a document to a new structure is something interesting - but I'm still wondering if this is a theoretical problem for many. Hence I second Benjamin - not so much the killer feature.

(Regardless - I still want to check out your ODM some time, it sounds interesting and if it's really lightweight and adds some convenience - why not. :-)]

I'm now in my second year of running CouchDB in production, I never had the need for either a schema or migrating documents because of changes to the schema. Part of the reason why I never ran into this is, that I like to keep my objects flat. No need for overall complexity.

Think:

if (isset($doc->prop)) { // new doc }

I think migrations are a RDBMS concept. I could be wrong, but I think people who are used to relational databases and the concept of normalizing data etc. just don't [like to] keep it simple.

Yeah, eventual migration isn't for all uses cases of course. Some times do you need to migrate it all at once.

Do note however that especially with CouchDB you tend to plan quite carefully on which fields you query and which fields you don't. So if you need to migrate fields that you don't query on anyway, but where you have a lot of them its could be a convenient solution to not have to bog down the server with migrating all the data (especially if a lot of it is old and likely to never be read again).