Apr 16, 2010

My Bets on Relational Databases

Relational database engines emerged because they could process large andcomplex data sets faster than ISAM engines. Notice the subtle qualification – large andcomplex data sets. Today relational engines scale down and compete well on small and flat data sets as well, but that wasn’t always the case. For years we, who wanted to use database servers in our work, had to prove to our management that we would deal with data sets large enough to justify the purchase of database software as well as appropriate hardware for it. Younger software developers probably don’t understand that since they can download SQL Server Express or Oracle SQL Developer for free, not to mention the open source database servers. Briefly, database servers have become commodity.

ISAM engines are extinct today, because the pioneers of the database industry foresaw the continuous growth of the data volumes that would need to be processed. While that growth has been steady in absolute measures, it has reached a tipping point relative to the capacity of a single [computing] box. For the first 25 years of their existence database engines were predominantly single-box, because data could fit on a single hard disk and a single CPU was sufficient to process it. About 10 years ago RAID controllers multiplied disk capacity several times and SANs became popular shortly after. At that same time multi-processor architectures increased the processing power of the machines. While that extended the life of single box architectures by another decade, it revealed that the end was approaching.

About 5 years ago a new pattern emerged – Map Reduce. It gives up the ability to execute arbitrary relational queries in favor of the ability to distribute storage across a large number of machines as well as to process those large data sets in parallel. Both proprietary and community implementations of the Map Reduce pattern have been growing and improving their feature sets. The first question that arises is: will relational databases continue to exist?

Relational databases will not go away. They will continue serving their mission which is to execute complex relational queries against large data sets.

The main factor in my bet is the demand for executing complex queries. Our civilization is complex and so it has complex needs. The Map Reduce pattern seems to represent the next generation of ISAM. We’ve already witnessed ISAM yielding to relational databases. In fact, both Google’s and Hadoop’s implementations of Map Reduce include relational engines – Big Table and HBase respectively. The next question is: what would relational database servers look like?

Relational database servers will become relational database clusters with hundreds of storage and relational nodes. Those relational clusters are likely to employ a non-uniform architecture with regard to storage, i.e. each storage node is likely to have a set of dedicated relational nodes that will execute queries against data form that storage node eventually joining it with data from other nodes.

Here are some clarifications. I assume that if a system can scale out to 200 nodes, it can scale out to 2,000 nodes. I cannot guess whether today’s relational database servers will evolve incrementally or whether they will be rewritten from scratch based on a scalable pattern like Map Reduce.