Web 2.0 and the relational database

Yes, this is yet another rant about how people incorrectly dismiss state-of-art databases. (Famous people have done it, why shouldn’t I?) It’s amazing how much the Web 2.0 crowd abhors relational databases. Some people have declared real SQL-based databases dead, while some have proclaimed them to be as not cool any more. Amazon’s SimpleDB, Google’s BigTable and Apache’s CouchDB are trendy, bloggable ideas that to be honest, are ideal for very specific, specialized scenarios. Most of the other use cases, and that comprises 95 out of a 100 web startups can do just fine with a memcached + Postgres setup, but there seems to be a constant attitude of “nooooo if we don’t write our code like google they will never buy us…!” that just doesn’t seem to go away, spreading like a malignant cancer throughout the web development community. The constant argument is “scaling to thousands of machines”, and “machines are cheap”. What about the argument “I just spent an entire day implementing the equivalent of a join and group by using my glorified key-value-pair library”? And what about the mantra “smaller code that does more”?

Jon Holland (who shares his name with the father of genetic algorithms) performs a simple analysis which points out a probable cause: People are just too stupid to properly use declarative query languages, and hence would rather roll their own reinvention of the data management wheel, congratulating themselves on having solved the “scaling” problem because their code is ten times simpler. It’s also a hundred times less useful, but that fact is quickly shoved under the rug.

It’s not that all Web-related / Open Source code is terrible. If you look at Drupal code, you’ll notice the amount of sane coding that goes on inside the system. JOINs used where needed, caching / throttling assumed as part of core, and the schema allows for flexibility to do fun stuff. (Not to say I don’t have a bone to pick with Drupal core devs; the whole “views” and “workflow” ideas are soon going to snowball into the reinvention of Postgres’s ADTs; all written in PHP running on top of a database layer abstracted Postgres setup.)

If Drupal can do this, why can’t everyone else? Dear Web 2.0, I have a humble request. Pick up the Cow book if you have access to a library, or attend a database course in your school. I don’t care if you use an RDBMS after that, but at least you’ll reinvent the whole thing in a proper way.

What other people have to say:

Vagif Verdi (not verified) said:

What about the argument “I just spent an entire day implementing the equivalent of a join and group by using my glorified key-value-pair library”? And what about the mantra “smaller code that does more”?

That’s a strawman argument. Why do we have to use glorified key-value-pair library when we can use sophisticated and powerful OODB ?
With OODB you do not need to implement joins because object graph works there just fine. And OODBs have indexes for a long time. It is not an exclusive feature of relational databases.

Now mind you, OODBs are very different from SimpleDB or CouchDB. And they perfectly fit for complex domains with very rich data hierarchy. Something relational databases handle very poorly.

arnab said:

Vagif, you are right in pointing out the maturity of some OODB products, I am not claiming one is superior over another. My claim extends to all established, state-of-the-art database products (relational, or non-relational) that are extremely capable of handling a large majority of popular use cases. Each need will have it’s own ideal data model (RDBMS, Native XML DB, OODBs…), and it is important to not forget these well-developed solutions amidst the hype of these new ideas that are just starting out, and are hence extremely primitive in many, many ways.

The three basic types of databases, Hierarchical, Relational, and Network are constantly being re-invented.

When Google decided to make searching the web fast, they used the fast type of database, a Hierarchical index re-invented as Big Table.

When they decided to make money from their service, they used a relational database, notably MySQL to collect billions in revenue from their the Ad programs. When they needed to scale MySQL they choose to introduce sharding. The SEC and SOX compliance require Google to not just be really fast, but also accurate in their financial reporting. This is the main strength of a RDMS.

If someone knows how they are using network databases, OODS or XML DBs, I’d like to hear about it.

Kieran

arnab said:

Keiran, thanks for your comments! As you pointed out, relational databases are great for analytics / financial stuff. It allows transactional correctness as you suggest, and it also has the benefit of providing the programmer with a rich, expressive query layer which allows you to put forth complex ideas as small declarative queries. From a startup’s perspective, this expressiveness saves a LOT of development time, assuming the developer knows how to use it. Sadly, people still use MySQL / Postgres as a primitive object store.