I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.

The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:

SaaS (Software as a Service).

A low-volume static website.

A network focused on office software.

A single cheap server, likely running a single instance of a general-purpose RDBMS.

Those enterprises, however, are generally not who I write for or about.

The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:

A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.

MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.

In-memory BI.QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.

Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.

Tighter integration of analytics and OLTP (OnLine Transaction Processing).Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.

Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.

*I’ve long used “tabular” to cover both relational and MOLAP structures, the point being that in both cases you have a neat and regular schema, well-represented as a set of arrays.

What could change this picture would be a future in which:

All your tabular business data fits into RAM.

Also, the OLTP/analytic DBMS distinction becomes less important.

In that case, it might be reasonable to get by with:

A single in-memory relational DBMS, handling OLTP and some analytics alike.

If all the data is in RAM, these problems are indeed lessened. Also, Oracle Exadata is dedicated to the premise that, even using conventional computer parts, I/O bottlenecks can be reduced with enough hardware — and price aside, it seems to work. Still, if you talk with analytic RDBMS designers, you repeatedly hear that it’s not that simple — even getting data efficiently out of RAM is different in the analytic and OLTP cases.

Query planning/execution, data movement, and workload management go together — they’re all about getting the most work done with the least machine effort, and they all depend on determining which specific execution choices might be synergistic or anti-synergistic with each other. Taken together, they form a very tough optimization challenge, which is different in the OLTP and analytic cases. Adding in analytic platform capabilities adds yet more difficulty to the optimization problem. And so:

A fast analytic database manager is a hard thing to build; expecting it to be fast at OLTP as well may be too much to ask for.

Given that, the discussion pivots to:

OK, but can we overprovision the RAM by so much that suboptimal performance doesn’t matter?

My guess is “Not any time soon” — because efficiency is always a good thing, databases will always grow, and RAM will never be free.

Comments

10 Responses to “Do you need an analytic RDBMS?”

Thomas W Dinsmore on
November 6th, 2012 9:15 am

Analytic databases will always be with us in part due to organizational conflict between the needs of business stakeholders and the IT organization. Analysts whose careers depend on speed and agility are often at odds with IT, which tends to be control-oriented and focused on cost.

There are entire industries that depend on this divide. In 1995, Gartner predicted that improvements to enterprise data warehouses would put Marketing Service Organizations out of business. MSOs today continue to grow and expand.

from the point of the start-up that has all the computing resources on Amazon cloud, I’d like to make a couple of points:

– Access to elastic map reduce resources (Hadoop and Amazon map reduce) is very important. However, map reduce coding (which needs to be flexible, quick and “versioning friendly”) is not trivial, and the deployment of the environment takes time.

– Availability of the large analytical DMBS with advanced SQL capabilities can accomplish most of the required (and quite advanced) processing leaving a portion of only what’s absolutely required for Hadoop and other non-db and non-mapreduce processing.
We use Greenplum CE (on the largest single cluster instance in EC2 deployed with multiple EBS volumes) and it does the job beautifully even with the bad I/O typical for EC2. It is also very easily integrated with S3 storage and non-database processing.
Data partitioning, compression, and advanced querying capabilities all what PostgreSQL has to offer with parallel processing is very hard to beat.
This setup allows for a very flexible, very capable, and very cost effective infrastructure that is very hard to achieve on any single platform.

My company runs Database with a data base company- We also use Greenplum CE and it does a wonderful job as well… It is very flexible.. Our IT dept has been very appreciative of the new set up. Thanks for the great post!

The MOLAP ship is alive and well – but perhaps has found its place in the spectrum of BI solutions: Essentially batch-based and conceived around pre-aggregation as the strategy for increasing performance, it’s retreating into the mid-data/complex structured data arena leaving the lower latency, continuous reporting of very big data to the big appliance vendors.

In particular, the rich (but complex) language that can define sets and calculated members is very useful in encapsulating complex logic within the cube structure for use in slice-and-dice interfaces. Additionally the MDX language being devoid of JOIN syntax is far more suited to reporting queries than SQL ever is.

MattK on
November 28th, 2012 11:25 am

> Performance can be gotten without MOLAP rigidity.

Is this perhaps what is driving the Microsoft BISM / Tabular mode and DAX?