It is always stated that it is hard to store applications objects in relational databases - the object-relational impedance mismatch - and that is why Document databases are better.

However, is there really an impedance mismatch? And object has a key (albeit it may be hidden away by the runtime as a pointer to memory), a set of values, and foreign keys to other objects. Objects are as much made up of tables as it is a document. Neither really fit.

I can see a use for databases to model the data into specific shapes for scenarios in the application - e.g. to speed up database lookup and avoid joins, etc., but won't it be better to keep the data as normalized as possible at the core, and transform as required?

Did you read the article? It lists the differences quite clearly. And simply saying "but on the lowest levels its all just bits and bytes" doesn't make it go away. There are conceptual differences that are important. Semantics is not a curse word.
–
Joachim SauerApr 26 '12 at 9:11

1

-1 "and that is why Document databases are better" - this opinion is not offered in the linked article. Neither do you offer any other evidence for this assertion, nor do you clarify whether you mean specifically for storing objects or for data in general. Further, as pointed out above, the linked article discusses the issue in some detail.
–
Mark BannisterApr 26 '12 at 9:27

1

The link to the document was added by the editor @gnat not by the orginal poster.
–
Lee AtkinsonApr 26 '12 at 9:58

4 Answers
4

There is definitely object-relational impedance. I had to deal with this for years on a product that kept its data on a database but was OO when it ran.

As a vastly simplified example, consider three DB tables. Two are just lists, one containing people and one containing jobs. The third is a link between people and jobs, with a column for each. (And all three have lots of columns for lots of subsidiary data.)

In the UI, a user wants to look at the data in terms of a job and the people working on it. A look at the DB, if it's large, is going to be worthless--too much disorganized info. But in the UI, a job object can serve as the basis for good display. Even if the user has a list of a million jobs, he can scroll to the one he wants and see all the relevant info: who's working on it and when and how hard and with what skills, etc.

So on the DB you have 3 tables. In the OO program, you have job objects, people objects and the links between them. The jobs and people lists relate directly to the DB tables, but there is no list of links. The links are part of, or at any rate tied to, the job objects. To create a link table from the OO data, you have to work through all the jobs. To create a proper jobs object, you have to read through the link table and do some awkward rearranging of data.

This is the essence of the impedance. This example is simple in order to be easily understood, but it drastically understates the real-world difficulty in translating between the two concepts.

OO is much faster. It's easier to use once it has been set up. However, it forces a certain organization on the data. If this organization is not done right, or is not adequate, there can be trouble. For instance, in my example the key object is the job. If you want to know what work an individual is doing, it's no good. In this simple example, it is obvious that there needs to be a people object that contains all the work a person is doing, but in a real system the number of possible structures will approach infinity; you have to choose to do that limited number that will let the program do its job.

OO bogs when the random-access aspect of memory becomes slow. OO depends on a lot of direct links between bits of data. How is a large OO database supposed to work? A Document Database makes more sense, but it will either have to really be an OO DB or duplicate vast quantities of data: note that a job object includes people objects, and those people objects include the job object.

Relational data is much slower, but it is without prejudices. A bit of SQL and you can look at the data any way you want. The other, overwhelming, advantage it has is that slow, not-so-random memory--which will bring OO to a halt--doesn't bother it. So if you have terrabytes of data, you want your data stored relationally, not object-orientedly.

So I think we're stuck with relational databases and object-oriented programs and doing a lot of gyrations to get data from the one to the other and back again.

Also: I have recently done some work on a system that uses in-memory relational data. It is very slow, but not so slow as to be seriously annoying (it's not a computer game!) and the original developers, while they had to design a DB, got to skip worrying about an OO design. I considered OO'ing a bit of it, but that meant a lot of programming versus writing just a bit of fancy SQL (LINQ, actually). It clued me in that relational could sometimes compete with OO on its own turf.

It also points out that OO forces organization on the data. If the original developers had known what I needed to do and used OO to do it, the program would have run faster and I could have modified it with a lot less effort. (SQL is tricky.) But if they hadn't OO'ed it they way I needed it done, I'd have been right where I was but without the in-memory relational DB.

Summary: OO and relational are two different things, and converting between them involves impedance--it's a pain. Each has its uses, and I think we're often going to be stuck using both.

@Izkata Django and many other ORM frameworks are designed to deal with the impedance mismatch. Having a packaged solution doesn't mean that that the impedance isn't there, nor does it mean that the frameworks can deal with all of the issues.
–
BobDalgleishMay 12 '14 at 12:20

A problem of impedance mismatch between applications written in OO languages and relational DB is not a problem of discrepancy between object-oriented and relational approaches themselves. Its real causes can be found in habitual implementation of the OO approach. Direct comparison of the two approaches cannot be used as a base for the conclusion that they are discrepant or mismatched. Experimental proof of absence of contradiction between object-oriented paradigm and relational data model is also presented.

The wiki article provides a reasonable overview but not necessarily real world experience.

The complexities of mapping a relational database to an OO scheme are vast and there is no such thing as "better".

It depends on a lot of things and consequently I often end up creating my own ORM implementations. A document database is only useful if your access pattern respects the one-way tree structure that a document database permits. If you need something else you may find yourself needing a graph database which wont prove any better than relational databases for typical usage scenarios.

Keeping data normalised as possible does give the most flexibility but not necessarily performance. Personally, I tend to find document databases more useful as advanced result caches than for anything else. This is a bit misleading because if you want to use your document data in any other way than your model caters too while anything may be possible it will be much slower. Performance depends on your access patterns and there is no "best" solution. I would recommend anyone to start with a relational database as this is likely to cater to most needs and give the best flexibility. It will also help give a measure of access patterns which will make any choice to use a document database reasonable.

I think that actually, apart from minor problems, it's very easy to match objects to relational data and vice versa. This is assuming you're not using any complicated features such as triggers.

The real problems occur during runtime and in the dataflow. You have problems with consistency, concurrency, performance, transactions, etc.

There are two hard problems that I always have with ORM:

You are storing data from the Database in RAM so there's an instant mismatch there because another process can come along in the background and change something. This is partly why even object databases are difficult to use. Sure, you can lock things, but if you peer below you can see why that might not work out so well.

Your libraries are writing most of the SQL for you just in time but have no chance in hell of being able to optimise as well as you can by hand. This is a just what you need scenario versus pull all the stops just in case. ORM cannot anticipate exactly what you need so you must either carefully tune it or it must try to cater for every scenario. In short, ORM is naive.

Consider that you have a table with a very large column and a number of smaller basic statistical columns. Most of the time you don't want to fetch the large column for performance. Here you must either find a way to tell the ORM to fetch only that column on demand in which case you have a potential consistency issue with the data already acquired, stick with poor performance, risk having to lock naively, change the schema to better suit the ORM with vertical partitioning, etc. It's not impossible to deal with but one of hundreds of problems, some more difficult than others. The point here is that in plain old SQL you would not have to think twice about this, you would just get what you need when you need without having to anticipating and unravel the ORM. The automatic nature of ORM makes it that much harder to anticipate especially when there is no one ORM pattern or implementation so you have to learn how your ORM operates under the hood to really use it properly. There are cases where ORM's can perform better but in > 90% of those cases you can achieve the same with a query cache or something simple as ultimately that is what ORM is doing in those cases.

ORM is meant to do things for you but it really can't anticipate all your needs. It ends up working like handwriting recognition on a touch screen where half the time rather than it recognising your handwriting, you have to make writting that it can recognise (some letters wont work unless you write then backwards such as e, etc). It's very common with ORM to be constrained to a specific schema layout beginning with primary keys for everything (but who doesn't do that anyway).

I have found that personally, what works best for me in most cases is a system of models for each table that auto generate the obvious queries that is inherited to provide custom queries for anything that needs it and a few separate models to provide categorisations to avoid god classes or misplaced queries. On top of this, I only really like to use ORM for view data where consistency tends to not matter as much anyway since it's going to be sent to a client where we have no control over and it's a last step process that occurs after all processing so will not directly affect writes.