I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:

Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big.

Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.

My answer, in a nutshell, is:

Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud – are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.

To see why, let’s start by asking: “With what do you want to integrate your analytic SQL processing?”

If you want to integrate with relational OLTP (OnLine Transaction Processing), your OLTP RDBMS vendor surely has a story worth listening to. Memory-centric offerings MemSQL and SAP HANA are also pitched that way.

If you want to integrate with your SAP apps in particular, HANA is the obvious choice.

If you want to integrate with other work you do in the Amazon cloud, Redshift is worth a look.

Beyond those cases, a big issue is integration with … well, with data integration. Analytic RDBMS got a lot of their workloads from ELT or ETLT, which stand for Extract/(Transform)/Load/Transform. I.e., you’d load data into an efficient analytic RDBMS and then do your transformations, vs. the “traditional” (for about 10-15 years of tradition) approach of doing your transformations in your ETL (Extract/Transform/Load) engine. But in bigger installations, Hadoop often snatches away that part of the workload, even if the rest of the processing remains on a dedicated analytic RDBMS platform such as Teradata’s.

And suppose you want to integrate with more advanced analytics — e.g. statistics, other predictive modeling/machine learning, or graph analytics? Well — and this both surprised and disappointed me — analytic platforms in the RDBMS sense didn’t work out very well. Early Hadoop had its own problems too. But Spark is doing just fine, and seems poised to win.

My technical observations around these trends include:

Advanced analytics commonly require flexible, iterative processing.

Spark is much better at such processing than earlier Hadoop …

… which in turn is better than anything that’s been built into an analytic RDBMS.

Open source/open standards and the associated skill sets come into play too. Highly vendor-proprietary DBMS-tied analytic stacks don’t have enough advantages over open ones.

Notwithstanding the foregoing, RDBMS-based platforms can still win if a big part of the task lies in fancy SQL.

And finally, if a task is “partly relational”, then Hadoop or Spark often fit both parts.

They don’t force you into using SQL for everything, nor into putting all your data into relational schemas, and that flexibility can be a huge relief.

Even so, almost everybody who uses those uses some SQL, at least for initial data extraction. Those systems are also plenty good enough at SQL for joining data to reference tables, and all that other SQL stuff you’d never want to give up.

They generally still provide the best performance or performance/concurrency combination, for the cost, although YMMV (Your Mileage May Vary).

One has to load the data in and immediately structure it relationally, which can be an annoying contrast to Hadoop alternatives (data base administration can be just-in-time) or to OLTP integration (less or no re-loading).

Other integrations, as noted above, can also be weak.

Suppose all that is a good match for your situation. Then you should surely continue using an analytic RDBMS, if you already have one, and perhaps even acquire one if you don’t. But for many other use cases, analytic RDBMS are no longer the best way to go.

Finally, how does the cloud affect all this? Mainly, it brings one more analytic RDBMS competitor into the mix, namely Amazon Redshift. Redshift is a simple system for doing analytic SQL over data that was in or headed to the Amazon cloud anyway. It seems to be quite successful.

Bottom line: Analytic RDBMS are no longer in their youthful prime, but they are healthy contributors in middle age. Mainly, they’re still best-of-breed for supporting demanding BI.

Comments

We use Vertica on AWS and we are pretty happy with it. We load raw and use parallel SQL to transform. We prefer it over Redshift for several reasons – partitioning, multiple projections per table, extensibility (UDF framework), and transactions actually work as expected.

“I hear about Vertica more as a technology to be replaced” – I’m curious what it is usually being replaced by?

Joel Wittenmyer on
August 29th, 2016 9:02 am

Curt,
@2009 is when I began reading your posts. I was re-architecting a data warehouse. You left one out of your list that is my favorite, and which came to my attention via your posts of that time: Exasol. With GoldenGate Change Data Capture to deliver our database update in near-real-time, and a Data Vault Integrattion Hub on Exasol, we will virtualize the delivery layer for BI, and do analytics on the Data Vault. How does that strike you?

I’d also add Google Big Query as an interesting technology, as well as Presto – open source DB from FB, that is similar to Impala. I’ve seen Presto start to eat away workloads previously handled by Vertica.

I think situation is quite different between cloud and in-premises. I believe, that in cloud elastic allocation of resources is a right way to do analytics over big data…
In-premises, where compute is actually scarce and not-elastic resource – efficiency per pound of hardware of MPP databases is important factor.

cerberus on
August 30th, 2016 10:34 am

Actian is axing their analytical databases, they already fired their dev teams.

Actian’s website seems to be in a bit of disarray. I was tipped off last night that it’s hard to find material about ParAccel or VectorWise there. However, what used to be Pervasive DataRush and related technologies still seem to be being pushed.

But then, I can’t find an Ingres references either, and Actian is surely still in the Ingres business, so we’ll have to wait and see.

I’d be interested in hearing from Actian as to what’s up. I’ve blacklisted them for years due to some lies they told publicly about previous conversations with me, and I still wouldn’t want a product-feature briefing for that reason. But if they want to tell me what businesses they basically are or aren’t still in, I’m curious.

If you look back, I covered Exasol years ago. It seemed like a sensibly-architected technology, but without the maturity to be a top-tier competitor, and without the momentum (Germany excepted) to change that.

An informant in whom I have great confidence tells me that Actian was shopping Vector/Vectorwise and Matrix/ParAccel for quite a while, unsuccessfully. The informant also supplied revenue numbers that were amazingly low.

And Exasol is putting out minor press releases like a company that is still chugging along. Ditto Kognitio. While I may be wrong about this, I get the impression that each of Infobright, Kognitio and Exasol is telling a “query accelerator” story as much as it is telling an analytic RDBMS one.

What will that mean for the Product and these products moving forward? Are the big players all going to follow and abandon them, consolidating the DBMS space again to what it looked like in the nineties (Oracle, Teradata…) and some niche players?

Trafodion seems to be getting stable and may not be a bad choice for an analytics rdbms. Well almost an rdbms ! Doing some benchmarks on it in our data sciences lab, so can comment better once this exercise is over.C curious as to how their mvcc based scheme scales.

I believe we will have RDBMS for awhile but their footprint will definitely shrink. In the Hadoop eco system we have yet to provide that fast connection that exist in RDBMS for supporting the dashboard and customer facing insights.

What I see working right now is to ETL offload data from the data warehouse experiment and find new insights, then push that completed data model back out the data warehouse.

I started as the Actian, SVP of Marketing, on Monday Oct 24th. I am sorry that you had a bad experience with Actian previously. I would love to talk one on one so please reach out to me and suggest some days/times that work for you. lennard.fischer@actian.com