programming and human factors

Object-Relational Mapping is the Vietnam of Computer Science

I had an opportunity to meet Ted Neward at TechEd this year. Ted, among other things, famously coined the phrase "Object-Relational mapping is the Vietnam of our industry" in late 2004.

It's a scary analogy, but an apt one. I've seen developers struggle for years with the huge mismatch between relational database models and traditional object models. And all the solutions they come up with seem to make the problem worse. I agree with Ted completely; there is no good solution to the object/relational mapping problem. There are solutions, sure, but they all involve serious, painful tradeoffs. And the worst part is that you can't usually see the consequences of these tradeoffs until much later in the development cycle.

Ted posted a much anticipated blog entry analyzing the ORM problem in minute detail. It's a long post. But unless you're a battle-scarred veteran of the ORM wars, I highly recommend at least skimming through it so you're aware of the many pitfalls you can run into trying to implement an ORM solution. There are a lot of magic bullets out there, and no shortage of naive developers.

Ted's post is excellent and authoritative, but it's a little wordy; I felt like I was experiencing a little slice of Vietnam while reading it. Let's skip directly to the summary at the end which provides an great list of current (and future) solutions to the ORM problem:

Abandonment. Developers simply give up on objects entirely, and return to a programming model that doesn't create the object/relational impedance mismatch. While distasteful, in certain scenarios an object-oriented approach creates more overhead than it saves, and the ROI simply isn't there to justify the cost of creating a rich domain model. ([Fowler] talks about this to some depth.) This eliminates the problem quite neatly, because if there are no objects, there is no impedance mismatch.

Wholehearted acceptance. Developers simply give up on relational storage entirely, and use a storage model that fits the way their languages of choice look at the world. Object-storage systems, such as the db4o project, solve the problem neatly by storing objects directly to disk, eliminating many (but not all) of the aforementioned issues; there is no "second schema", for example, because the only schema used is that of the object definitions themselves. While many DBAs will faint dead away at the thought, in an increasingly service-oriented world, which eschews the idea of direct data access but instead requires all access go through the service gateway thus encapsulating the storage mechanism away from prying eyes, it becomes entirely feasible to imagine developers storing data in a form that's much easier for them to use, rather than DBAs.

Manual mapping. Developers simply accept that it's not such a hard problem to solve manually after all, and write straight relational-access code to return relations to the language, access the tuples, and populate objects as necessary. In many cases, this code might even be automatically generated by a tool examining database metadata, eliminating some of the principal criticism of this approach (that being, "It's too much code to write and maintain").

Acceptance of ORM limitations. Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an ORM to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an ORM would create problems. Doing so carries its own fair share of risks, however, as developers using an ORM must be aware of any caching the ORM solution does within it, because the "raw" relational access will clearly not be able to take advantage of that caching layer.

Integration of relational concepts into the languages. Developers simply accept that this is a problem that should be solved by the language, not by a library or framework. For the last decade or more, the emphasis on solutions to the O/R problem have focused on trying to bring objects closer to the database, so that developers can focus exclusively on programming in a single paradigm (that paradigm being, of course, objects). Over the last several years, however, interest in "scripting" languages with far stronger set and list support, like Ruby, has sparked the idea that perhaps another solution is appropriate: bring relational concepts (which, at heart, are set-based) into mainstream programming languages, making it easier to bridge the gap between "sets" and "objects". Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such as functional/object hybrid languages like Scala or F#, as well as direct integration into traditional OO languages, such as the LINQ project from Microsoft for C# and Visual Basic. One such effort that failed, unfortunately, was the SQL/J strategy; even there, the approach was limited, not seeking to incorporate sets into Java, but simply allow for embedded SQL calls to be preprocessed and translated into JDBC code by a translator.

Integration of relational concepts into frameworks. Developers simply accept that this problem is solvable, but only with a change of perspective. Instead of relying on language or library designers to solve this problem, developers take a different view of "objects" that is more relational in nature, building domain frameworks that are more directly built around relational constructs. For example, instead of creating a Person class that holds its instance data directly in fields inside the object, developers create a Person class that holds its instance data in a RowSet (Java) or DataSet (C#) instance, which can be assembled with other RowSets/DataSets into an easy-to-ship block of data for update against the database, or unpacked from the database into the individual objects.

Ted quickly posted a followup entry which addressed common criticisms of his original post. If you have an itchy left mouse finger poised over the "comment" link right now, you may want to read that first.

Personally, I think the only workable solution to the ORM problem is to pick one or the other: either abandon relational databases, or abandon objects. If you take the O or the R out of the equation, you no longer have a mapping problem.

It may seem crazy to abandon the traditional Customer object – or to abandon the traditional Customer table – but picking one or the other is a totally sane alternative to the complex quagmire of classes, objects, code generation, SQL, and stored procedures that an ORM "solution" typically leaves us with.

Both approaches are certainly valid. I tend to err on the side of the database-as-model camp, because I think objects are overrated.