Sat, 14 Mar 2009

Your favorite ORM sucks

Over the past year or so I have been learning and enjoying Django, which is a large set of python modules that are impressively nice for creating dynamic Web sites. It's the latest generation in a set of tools that have been written to solve this very problem, going back all the way to the original CGI libraries of the mid-1990s.

You can get more details on the Django Web site, but the basic pieces you interact with are these:

Object-Relational Mapper:

This saves you from having to type in COBOL-esque SQL queries and manually connect the results to the data structures in your code.

Template Language:

This saves you from having to type in the punctuation-heavy HTML or XHTML to render your pages, and lets you focus more on the content.

URL Dispatcher:

This saves you from having to build trees of scripts or put argument-based flow control spaghetti in your code, and lets you restructure the layout of your URLs in a nice clean manner.

Form Handler:

This ties the previous three together to safely simplify one of the more dangerous aspects of programming: input validation.

There are more subsystems (such as the libraries of HTTP handling functions available to you in your “view” code), but those four seem like the legs of the table to me.

Feelings of Inadequacy

As I've been working with this, I've run into a number of conversations on IRC that seem to go something like this:

Really? I find it wonderful to have all that coordination among the pieces.

nerd:

Django's ORM sucks and is too Django-specific!SQLAlchemyforever! Behold The Power!

Okay, so what can you do? After enough of these conversations, I had this sense that I was kind of using the My First Sony of Web development systems. And the Loosely Coupled, Tightly Integrated motto has been pointed out to have the effect that while you can swap a new ORM into your Django app, few people ever use the Django ORM in anything else.

Eventually it came time to do some work on a non-Web application and I needed a database. With all these conversations with Riot Nrrds like the above, I figured I'd save myself a lot of trouble by going straight to SQLAlchemy.

The Reason We Have ORMs

There's a reason we have these “Object-Relational Mapper” things, and it's because of a problem known as the Object-Relational Impedance Mismatch. Basically, the formal mathematical model for databases used to ensure that they stay intact follows a system of tables with rows and columns and references to other tables, while data structures in most programming languages we use today manipulate data in nested tree-like structures. It's rather like the difference between a spreadsheet and an XML document, or between a ledger and a family tree diagram.

For a long time people just suffered through it: building SQL queries in strings and shoving them over the wire to databases, then manually parsing the results and populating the local data structures, all the while hoping nothing got mistranslated in the process. But in the argument over why we keep relational databases around in this object-oriented day and age anyway, people noticed that there was a formal and automatic mapping you could perform between the two.

So fundamentally the ORM is there to allow you to manipulate all your data in one single format, save you the trouble of re-inventing the necessary mapping techniques between the two, and cut down the complexity of your code thereby reducing your exposure to bugs. It's a fantastic thing, since relational databases are still important for their speed and reliability, and every reasonably-expressive programming language since LISP (which is from the 1950s, after all!) has had an implicit bias toward hierarchical representation of data.

Behold The Power!

Having fiddled with Django's ORM, and looked into Ruby's ActiveRecord system and read a bit about the history of Java's Hibernate ORM, I decided to see what was so hot about all the other Python ORMs out there.

I quickly learned that my sample of three had been somewhat biased toward a particular state of mind: namely those who want the impedance mismatch to actually be solved for them. The majority of ORMs used by the Riot Nrrd set instead seem to be written by people who enjoy the power of SQL and demand lots of advanced SQL features. Nothing to actually make your job as a programmer easier ranks anywhere on their priority list. One of the high-end python ORMS even brags in its Features list about how it can't do schema generation!

The result of all this talking out of school was that I had a project using SQLAlchemy and it was a horrible bureaucratic mess. I had session object setup and tear-down all over, manual coupling between table objects and actual useful objects for some reason, and it barely worked at all. I was forking processes off that needed to do a bit of database work, and my transactions and sessions kept stepping on each other. Hell, even the simple single-process jobs were hard to get right!

I'm sure this is where the Riot Nrrd contingent would step in and scream about how incompetent I must be. Why, a truly skilled SQL craftsman would be able to foresee all the intricate concurrency and relational integrity performance-critical session collision transaction issues that would arise, and work around them!

Back to the Tinker-Toys

Frustrated and eyeing a looming deadline warily, I decided to see if maybe I could just stick to what I knew and come up with a better solution. Hey, maybe I'd bite the bullet and set up the whole thing as a Web app inside the much-derided all-singing all-dancing framework (a term that to me typically refers to an unfinished project, rather than a useful set of libraries).

James Bennett is one of those hackers you wish there were more of on the Internet. He's knowledgeable and proficient while remaining reasonable and courteous. He reminds me of an old Jesuit brother I once knew who never seemed to open his mouth unless he was saying something that would help someone out.

Anyway, Bennett wrote an entry of his own on how to do standalone django scripts, and that was in 2007! In 2007, Django was still in version 0.96, and there was a lot of isolating of components to go before the 1.0 release. And damn if the settings.configure() trick isn't simple and straightforward!

…and then my database code lives in myapp/models.py and is something like ⅓ the length of the corresponding SQLAlchemy horror. It also gains a little input validation magic from some application-level data types (such as “IP Address”) that assert constraints not present in the database itself.

The Return of Session Management

Ah, but there comes a wrinkle! You just knew there'd be a wrinkle, didn't you? Remember how I was forking off asynchronous handler processes? So here's where that sinking feeling of inadequacy returned. I mean, Django's implicit session handling was causing me a headache, which sounds like precisely the sort of thing the Riot Nrrds were kvetching about!

Django's DB access model is very conservative: it won't chat with the database basically until the query you're building is at a stage where it needs information from the DB itself to proceed any further. It's a very lazy approach and I love it to bits. The session handling is all done behind the scenes, and any DB-bound “model” object will try to use a pre-existing connection if possible, but will open a new session if needed.

So I rolled up my sleeves and prepared to throw manual and bureaucratic session-management code all over my app, and ended up with the following:

That's it! Two lines to just drop the current database connection before forking, and one of them is an import! Then the parent and the child each get a new session the next time they perform a query or update operation on any Django model object. I'm still picking my jaw up off the floor over how simple this was.

Django For The Win

So I got to do a massive bit of coding with the delete key, and I can now be confident about the correctness of all my database code. Furthermore, I did it without dragging in any baggage from the rest of the Django set (unless you count the brief use of the django.conf.settings module, which is a fair point).

So where does all this hatred for Django come from in the Riot Nrrd set? I suspect it is partly a quick look at early (pre-0.96) versions that were still in a “we're working on separating our framework from the Ellington CMS” state, and largely a reaction against the horrible “type a bit of python code into this browser input box” model of early Zope releases. I think that Zope in the late 90s left a lot of programmers with a bad taste in their mouths.

But in the end none of the complaints about Django held up, and SQLAlchemy simply wasn't fit for purpose. I ended up feeling kind of cheated, like I'd lost a week because I listened to some uninformed Real Programmer posturing that did nothing but make trouble. It could have been that I just hit a bunch of DBAs who use Python occasionally instead of people like me who are Python programmers who occasionally need a database, but I know that's not really the case.

Okay Maybe Only 99.9%

Right now my only complaint about Django is that the #django channel on Freenode contains an op by the nickname of Magus who is the absolute opposite of James Bennett. Every time Magus answers someone's question, he does so in the most condescending way possible. Someone noticed this and set up a twitter feed to log each time he uses “obvious” or “of course”, but in February he changed his nick to avoid it. I'm sure this guy is a useful contributor to Django, but answering even naïve questions like that is simply not helpful.