OOP was the sexy programming model, and relational set theory seemed so quaint. Once you are using Objects, why wouldn’t you just want to persist them instead of having to drop down to this crazy SQL? Inner joins instead of just person.name.first? Fools.

Well, it didn’t quite work out that way of course. Instead we got half way measures such as object-relational systems, and the huge quagmire (as Ted Neward would put it) of the ORM years, which continue to do well.

Why did Object databases fail? If I remember correctly, it feels like there were a couple of problems:

They were slow at first

People had a crap load of tools around the relational world.

It was fine to do some simple work, but what about reporting? Where was the Business Objects for the Object database? I remember working with a huge bank that used Versant AND Oracle, and they had a nightmare involving syncing between the two.

Ok, so the Object database failed, so what is the new attack?

The Cloud-y Web

SQL is an enterprise victory that managed to make its way into the consumer Web and application space. A lot of people knew SQL, and it seemed obvious to have a LAMP stack or a Java / .NET stack backed by a RDBMS.

Is this really the right choice for Web applications? Why was Rails so successful? It was due to the productivity gain. How much of that is due to ActiveRecord vs. the other Action* pieces that make up Rails? I would argue a large percentage. Working with the database was actually a big pain in the tuches. ActiveRecord together with migrations helped a lot. It gave us a nice middle man between a full ORM and the SQL that we know and …. know.

What if the database piece didn’t need to be that painful? The source of the pain can be the paradigm shift between the various worlds, but also a huge part of it is scalability. When you have to scale your website, it can be fairly easy to make your application stateless, and then the bottleneck becomes the poor database. This is when you break out the master / slave relationships, think about partitioning of the application, and caching layers (Tangosol Coherence, memcached). Now you have to really think about an architecture ;)

Google had to do this thinking a long time ago, as they obviously have to scale their applications to a huge degree. Scaling the fairly read-only search operation is one thing, but as soon as you get to read-write operations you have a lot more of a head-ache. Scaling a MMORG astounds me. To be that real-time, and having the world constantly changing. Wow. At least there are the separations of locations (world X can be this cluster of machines).

Now we get to Bigtable, the engine that Google built to scale in the cloud. Amazon has their new SimpleDB, and there are others.

What these guys are all doing, is revisiting the database story. Maybe it is time to think about if a RDBMS is the no-brainer choice.

When Google App Engine launched, I thought there would be a lot of people saying “oh man, I just want MySQL instead of this new thing”. I barely heard that, and instead heard more thoughts along the lines of “It is great to be able to use the scalable database that Google uses internally.” In fact, when you start using it and see that it is schema-less, you get a bit of a relief. You can build your model, and even use an Expando to be highly dynamic on the data in the backend. You go along your way, iterating on your code and model and you don’t have to spend time working on up and down migration methods. Doesn’t that remind you a little of the OODBMS dreams? But this time it is fast and scalable!

Resting on the Couch

With the interest in Bigtable via App Engine pushing thought, we also have CouchDB pushing from the other end. The end that says, what would a RESTful approach to a database be?

It is great to see new ideas and thought about the storage of data. The RDBMS isn’t going anywhere of course. There are still a ton of tools out there for it and legacy code, and we all know that:

Data stays where it lies.

It is much easier to implement a new application talking to the old datastore, than migrate the datastore itself. It is like taking out the foundation. Also, SQL is getting new life in places too.

SQLite

I recently saw an application that used GWT on the client, and JavaScript on the server, which reminded me of my comic above. I wonder if we may end up with another flip, having SQL being used in the client, and other systems like CouchDB, Bigtable, etc being used in the enterprise / on the server.

It is happening on the client. SQLite seems to be everywhere. Your operating system, phone, browser, applications, everywhere. I bet I have around 20 SQLite engines on my system right now, and growing. Why is this happening? Well, instead of coming up with your own data format, parser, and search engine, why not just use SQLite and be done. It is very faster, perfect for single user mode, so everyone is a winner.

So, SQL has a looooong future ahead of it, but it will be interesting to see how the RDBMS weathers the latest storm.

FWIW the persistence API in Qi4j(.org) is entirely based on the schema-less notions of SimpleDB et al. I’ve never had to use an ORM to this date, and I haven’t missed them one bit whereas my friends who use them seem to not be so happy. We’ll see how it goes.

I agree that object databases haven’t become very popular. As someone who’s been using a Python-based object database (the ZODB) for about 10 years now, I wouldn’t say they’ve been a complete failure. Plone is a rather popular system that uses an object database. So, they are still alive and well in various niches. That’s not to say relational databases don’t have advantages compared to an object database, and the other way around as well. I also agree SQL isn’t about to go away, though it’s good to see various alternative approaches to data storage being explored.

True enough, migration is already painful moving from MySQL to PostgreSQL. But think of it this way. With a painless ’schema’ and the JSON-or-similar methods of getting data out. You can rather simply just _write an application_ to migrate the data. Pull it out of CouchDB (or whatever) and insert it into an RDBMS (or other ODB) however you can (ORM, more JSON, etc).

The only logical organization would use the same language in the browser as the server. I’d love to force the browser makers to implement Ruby, but I think they have too much power in place. So the only logical organization uses JavaScript everywhere.

Fortunately, the world of computers is run by people, and not by logic…

Great stuff. I remember one big problem with the ODB systems in their early days where the lack of a standard query language like SQL. Sure there where some standards but they where far from the SQL standard.

I think it neat to see more of these flexible storage platforms. One I love when it comes to just indexing contents for searches is http://lucene.apache.org I’m very impressed with it’s performance.

SQL is just incredibly powerful as a query language. Nothing has ever come close. OO querying (as in hibernate) is just unreadable and unwritable and not nearly as powerful as SQL.

The main problems with RDBMs have to do not with SQL, but with the schema. Creating a schema, foreign keys, etc. is a pain. Changing it is a pain. Add to that the corporate policy that goes with changing such schemas and forget about it.

Elimination of the schema, or simplification of it’s creation should be the goal.

Object Oriented Programming was invented to control the increasing complexity that resulted from ever growing programs in a world where the quantity of memory doubled every 18 months or so. Intellectual capacity of the programmer was the limit, not size/speed of memory.

Things were totally different in the Database world. Limit there is not the complexity, there is another limit first, it is the limit due to the slow speed of access to the huge set of required data.

Did OODB provide a response to that problem of speed of access ? Not at all.

Quite the contrary actually, OODB just made the problem bigger by increasing the size/complexity of the database entities, thanks to its ability to hide complexity with encapsulation!

SQL is the machine language of databases. Databases are not fast enough to support a higher level concept.

Things are changing. The amount of ram is rapidly increasing to the point where lots of databases can reside in memory, not on disk. OTOH the focus shifted from single user programs to massive multi users programs that required distributed systems.

I foresee a future for OODB, but first we need to solve the multi-core issue in OOP, with some new mean to handle the complexity of parallelization. That’s a tricky issue too, maybe more so than the database one.

Another option is to implement in a RDBMS a generic structure able to store any type of data object (maybe with some restrictions) and the relations between the different instances. Add a Java DAO layer to access it and you have the best of both worlds: An object oriented abstraction for flexibility and ease of development when working in a graph-oriented approach, and a SQL backend to perform more complex, not sequential searches.

Believe or not, I’ve done it. And it was a pain to develop, but performance, while not amazing, wasn’t that bad once we wrote the PL/SQL code to access an object. Any object. And it is really easy to maintain: we can add attributes and even new types of entities without changing the table definitions, just the associated (custom) metadata. That includes relationships between objects, without any need to set new foreign keys. Since the structure is standard for all objects, we can use the same PL/SQL (and a bit of dynamic SQL) to perform custom searches.

Of course migration to another database is extremely easy. The hard point is the migration of the PL/SQL procedures. Everything can be done in plain SQL, though (our first version was), but then performance is reduced.

And I have to admit that it gets funny when I have to look at the data directly using a SQL client instead of our DAO layer.

My approach to SQL, and RDMS specifically, has been on my mind quite a lot recently. I am an ASP.NET developer and technologies such as LINQ have made me rethink how I interact with storage. I actually like working with SQL and feel safe within structured stored procedures, but I can’t help thinking “there’s a better way.”

This was a good post and it has given me food for thought. I’m not sure I’m ready to ditch SQL Server just yet though.

Yup fast too. Do a google search for anything you want. In the top right corner of the page, I usually get my query done in a tenth of a second. Not bad for a database that stores a good portion of the Internet. No RDBMS has done that. Not saying bigtable is perfect (relational DBs will always have its place) but if you have problems that map well to this space then you have quite the tool at your disposal.

What about post-relational database like Cache – It provides a OO interface and you can still use SQL or even dig down into direct storage for lots of speed.
The database does have a build in Ajax type solution to get your data in/out.

I tend to think that OODBMS just got eaten by the web. It wasnt just OODBMS that got eaten alive, it was most any heavyweight db including RDBMS that got eaten as developers flocked to extremely lightweight tools like MySQL. OODBMS in particular had issues but I tend to think that they would have been addressed and OODBMS would have seen a slow ascent, had it had a market to continue growth.

This is an interesting article. I especially liked the cartoon series that starts with Java on the client, JS on the server, turns back, and turns back again.

I don’t “get” this BigTable stuff. I appreciate that Google has built fantastic applications based on this database. However, I am a SQL and ER modelling guy, and I just don’t understand all of this schema-less BigTable /SimpleDB stuff. I have a few big comprehension issues.

How do these databases handle update anomalies for information models with some form of information duplication or redundancy? How do you do ad-hoc queries on a schema-less database? When filling in a row of a table, how do BigTable programmers handle information elements that cannot be null? WIthout schemas, how can you extract-transform-load (ETL) a BigTable database into a Business Intelligence multi-dimensional cube? How can you build an application driven by the structure of the information model (like Ruby/Rails) when there is no schema to drive the structure of the application? How are BigTable et al different from the old BerkelyDB (because they look the same to me)?

I haven’t seen any of these questions answered on the web. If you could write about where to find answers, or why the questions are not relevant, that would be great!

What if the database is made so simple that you took it for granted and forget what model it is based on. Would still worry about objects/relations. I’m talking about worrying more about building that application and database is just a store, be it objects/relational, it just works.

ODB have not failed, but have carved out a very good niche. Being an optimist, I think there is a still a chance for it to become more mainstream. Versant is leading that charge, both with our Open Source offering, db4o, as well as our Core Object Database. By the way, Oracle is one of our customers, if ODB are so slow why do we get a check from them every 3 months. If you know anything about Computer Science, less code is always more efficient. Given the right model, an ODB is 10 to 20 times faster than Relational, SQL Server, MySQL, Oracle, DB/2, Sybase whatever. I know it has been awhile since the post, but I would be interested to start the thread again.

Funny I read this article as I threw the towel on formatting my data and decided to implemented SqlLite c# right within my program.

I do not think I will go to the extent of up/downloading the database content raw because I do need to process the data on the server (which at the moment is Google because it’s CHEAP) but I do like the idea that databases are nowadays ubiquitous and make our life easier.

OODBs appeared when the problems to be solved using OODBs accumulated.

Up to some time, you had rather un-complex, straightforward, although sometimes large data structures, on which you needed to perform pretty simple operations, so that each record could be processed mostly the same.

As OO became mainstream, more complex processing became possible, applications became smarter, and people started to expect more from their data stores. Which created pressure on programmers, which constantly had to deal with the impedance mismatch between OOP and RDBMs.

Two solutions emerged: ORM solutions, which tried to provide a generic workaround for the impedance mismatch, and OODBs, which tried to eliminate the mismatch altogether. Problem is, only few problems had only a part which was perfectly solvable by using OODBs, while for most problems just a small part was supported by OODBs, whereas most of the data was still ideally handled in a relational way. That’s why ORM solutions still thrive, whereas OODBs aren’t used that much.

Then came Google. Google wants to index everything. They don’t care if it’s a picture, a document or whatever, they want you to be able to search for it. How does this fit into the RDBMS concept? I’d say pretty bad – what do a JPG picture of a dog and a GIF containing an UML diagram have in common? This doesn’t even fit into the OO concept, since you can barely identify classes, at least not at the source code level – your application has to do the classification for you. Plus, your data model is huge, with most RDBMSs or OODBs not being able to support that amount of data. Which is why they invented BigTable – essentially a container into which you can write stuff, very much the same way you write bulleted lists while brainstorming.

Is this a new attack on RDBMSs? I wouldn’t say so – it’s simply a new solution to a new problem, one which RDBMSs are ill equipped to solve. Of course, Google could have created a new mapping library which should just store any BigTable inside a large cluster of RDBMS servers. But what’s the point? You don’t get efficiency by shoehorning a data structure on a storage system that doesn’t support it well.

Leave a Reply

Name (required)

Mail (will not be published) (required)

Website

Spam is a pain, I am sorry to have to do this to you, but can you answer the question below?