When should one use a traditional RDBMS (most likely Oracle, DB2, or SQL Server)?

The details vary with context — e.g. sometimes MySQL is a traditional RDBMS and sometimes it is a new kid — but the general class of questions keeps coming. And that’s just for short-request use cases; similar questions for analytic systems arise even more often.

My general answers start:

Sometimes something isn’t broken, and doesn’t need fixing.

Sometimes something is broken, and still doesn’t need fixing. Legacy decisions that you now regret may not be worth the trouble to change.

Sometimes — especially but not only at smaller enterprises — choices are made for you. If you operate on SaaS, plus perhaps some generic web hosting technology, the whole DBMS discussion may be moot.

Performance (scale-out). DBMS written in this century often scale out better than ones written in the previous millennium. Also, DBMS with fewer features find it easier to scale than more complex ones; distributed join performance is a particular challenge.

Geo-distribution. A special kind of scale-out is geo-distribution, which is sometimes a compliance requirement, and in other cases can be a response-time nice-to-have.

Other stack choices. Couchbase gets a lot of its adoption from existing memcached users (although they like to point out that the percentage keeps dropping). HBase gets a lot of its adoption as a Hadoop add-on.

Licensing cost. Duh.

NoSQL products commonly make sense for new applications. NewSQL products, to date, have had a harder time crossing that bar. The chief reasons for the difference are, I think:

Programming model!

Earlier to do a good and differentiated job in scale-out.

Earlier to be at least somewhat mature.

And that brings us to the 762-gigabyte gorilla — in-memory DBMS performance – which is getting all sorts of SAP-driven marketing attention as a potential reason to switch. One can of course put any database in memory, providing only that it is small enough to fit in a single server’s RAM, or else that the DBMS managing it knows how to scale out. Still, there’s a genuine category of “in-memory DBMS/in-memory DBMS features”, principally because:

In-memory database managers can and should have a very different approach to locking and latching than ones that rely on persistent storage.

I lack detail, but I gather that Hekaton has some serious in-memory DBMS design features. Specifically mentioned were the absence of locking and latching.

My level of knowledge about Hekaton hasn’t improved in the interim; still, it would seem that in-memory short-request database management is not a reason to switch away from Microsoft SQL Server. Oracle has vaguely promised to get to a similar state one of these years as well.

Of course, HANA isn’t really a short-request DBMS; it’s an analytic DBMS that SAP plausibly claims is sufficiently fast and feature-rich for short-request processing as well.* It remains to be seen whether that difference in attitude will drive enough sustainable product advantages to make switching make sense.

*Most obviously, HANA is columnar. And it has various kinds of integrated analytics as well.

First, I wouldn’t at all say that NewSQL products are in-memory. VoltDB and MemSQL are in-memory, but Clustrix, Tokutek et al. are no more “in-memory” than NoSQL systems are (and less so than Aerospike). Note also that Couchbase grew out of an in-memory technology as well.

As for competition between NoSQL and NewSQL — absolutely, most especially in the case of aggregations simple enough that NoSQL could conceivably handle them. E.g., MemSQL’s flagship deal is to a considerable extent at Couchbase’s expense, namely at Zynga.

aaron on
April 1st, 2014 11:00 am

I’m not following how you are parsing the differences. As I see it:
– There are many *for purpose* database engines focusing on specific use cases of data access. You call them NewSQL if they have a SQL API and don’t front a big data stack. A better name for these may be niche vendors; I’m not sure why anyone using them would care if they are in-memory or not, more interesting is what use-cases they manage well and how they manage that.
– There are a lot DBMS API, generally SQL, on big data vendors. These are generally big-data vendors trying to move up the feeding chain from commodity storage/compute services, though some are distributed databases branded as big data. These are niche, but the key here is a MR like processing layer.

My experience is that the choice for the 95% of applications doesn’t really make much of a difference, and people reasonably do what works for them, what fits into the existing stacks as you describe. What is interesting is where there is friction.

The friction could be cost, for example if a company is attempting to compete in data driven business on price of service.

The friction could be a programmer and change driven development march. Programmers, especially in agile environments, have different goals from data modelers. This leads to database-as-persistence designs, and dynamic schema and XML and JSON storage are artifacts of that. This optimizes for app innovation; it uses database for locking and recovery, but not much for query or data consistency.

The glamor outliers are the scale issues. These create performance and license and stability and special case management issues….

I very much like the your posts on dynamic schemas… but I wonder what is the trade-off? If you say dynamic vs. static then why would anyone want static? There must be a downside, I guess? Any ideas… as this would help with the theme of your post on when to pick what?

I agree that an in-memory DBMS is not a DBMS with data in-memory. It is a DBMS engine that is optimized around the advantages of no IO. The no-latch approach you mention is a great example but there are others including optimizations to utilize processor cache, the ability to use vector instruction sets, and optimizations for NUMA. There will be more coming, I expect, including optimization around nonvolatile memory.

Queries are easiest and most flexible against a fixed, relational schema. That’s the essence of Ted Codd’s invention.

What we’re seeing now is people trying to invent things that have rather competitive RDBMS functionality and performance while also having rather competitive functionality and performance in dynamic schema uses — and which furthermore have all that goodness against the same data.

John on
April 6th, 2014 6:45 pm

Curt,

NoSQL primarily has taken the databases back by couple of decades. Good for puts and gets (basically single row operations). Users have to spend time upfront designing a model which serves a particular application. Schema on read sounds more like a feature. Reporting and analytics obviously are not in a sweet spot of NoSQL engines. For example Datastax provides Hadoop node and Hive for analytics which is questionable. Will the user experience the speed using Hive? Cassandra doesn’t have much to offer when it comes to scanning the data and Hive on top seems like a toy.

I write about these trade-offs often. If it’s easy to design a relational schema that will serve well, one probably should do so, and use an RDBMS. But that’s not always the case, in practice and often even in theory.

Using a hammer to drive in a screw is rarely optimal, and often things are so extreme as for that metaphor to be apt.

John on
April 7th, 2014 4:26 pm

Agreed. Question

Why would a user choose cassandra or mongo over memsql or voltdb or clustrix? All of the 3 vendors claim to have support for schema on read feature with JSON type support and they are blazingly fast.

I liked Akiban’s hybrid approach, but customers evidently didn’t agree, and the company was acquired.

WibiData Kiji is another kind of best-of-both-worlds attempt.

MarkLogic is also trying best-of-all-worlds, but I haven’t kept up with them since soon after the latest management team change.

And of course there are the various schema-on-need efforts, but those are analytically focused, and I suspect that’s not what you’re asking about.

John Held on
April 10th, 2014 7:53 pm

Very useful discussion.

Scale out+licensing+can live w/out joins line up with what I’ve most seen recently in evaluations of traditional SQL versus NoSQL.

I think the more dynamic schema is also an important when the rate of change is high (as it is for product hierarchies, content taxonomies, customer profile data, and many other domains.

Great article!

Charles Fox on
April 23rd, 2014 10:49 pm

Hi Curt,
I enjoyed the article, I am writing a research paper for a graduate school class on NoSQL and this is very helpful.
I have a question regarding using NoSQL. What criteria would you use to determine which NoSQL database to use? More specifically MongoDB vs. Cassandra. Also, regardless of the type of NoSQL, what would be the biggest challenges to a company implementing one for the first time?
I appreciate any feedback you can provide!
Thanks!
Charles

A friend on the Redmond campus told me for many quarters Oracle, DB2, SQL Server have been number 1, 2, 3 by revenue. Till Q1 14 when this flipped. Perhaps this is more the NoSQL products starting to take revenue away from higher end databases.