Why I Love/Hate NoSQL and RDBMS Databases (not a versus post)

I was browsing around last weekend and ran into a post on a fairly big tech site about how the databases you’re probably using right now “aren’t the best”. Then it went on to explain how great and cutting edge NoSQL databases are. Another veiled “NoSQL vs. RDBMS” argument really. Now I kind of got into it a bit with the author in the comments, and for the sake of tact I’m not going to post the link here.

However, I felt like another article needed to be written to offset the negativity. So here it is….

Personally I don’t see a problem being a “fan” of SQL Server/Oracle in the same breath as MongoDB/Cassandra.

So what is the fuss? Why does this have to be a religious debate? There are certain facts in play. Yes, a NoSQL database in all likelihood is going to outperform a RDBMS database, probably by a factor of something. But can you do OLAP cubes with NoSQL? Does it safeguard its transactions as well? Probably not without hampering the performance (which is the whole point right?)

Here are some back-of-napkin reasons I love/hate both:

Why I love RDBMS:

Rock solid (it’s ACID man)

Proven and established

Incredible toolsets (reporting, analysis, BI, etc etc)

Ability to pull bits of data together to make something meaningful (JOINS, OLAP)

* The “impedance mismatch”, unless your data is naturally tabular, like financial records etc, the relational model is not very well suited for representing object data with one-to-many and many-to-many relations, hierarchical structures, … hence the need for ORMs, aka the Vietnam War of systems development

What I like about NOSQL:

* Simple and intuitive data structures which are easily matched to programming constructs

Lots of bullet points with, mostly, no justification.
Here’s the things you need to tighten up to get better (but not full) marks for this piece.

1. Cost: You relate NoSQL to cheap and RDBMS to expensive. From your preamble it’s clear you equate ‘real’ (my emphasis) RDBMS to SQLServer and Oracle. You should check out the PostgreSQL, which out performs both in this regard. They don’t call PostgreSQL the Open Source Oracle without good reason. However, have you tried scaling NoSQL? Assuming Open Source solutions, the cost is one of hardware. Until we have ‘OpenHardware’ where it all comes for free, the costs for both RDBMS and NoSQL are equally constrained.
2. RDBMS is difficult to learn (mountains of user documentation): Here NoSQL and RDBMS are the same, with fully described use cases for RDBMS far exceeding those for NoSQL by several orders of magnitude. NoSQL will catch up, but not for a while.
3. Proprietary syntax: Each RDBMS has proprietary extensions that often cause migration issues with specific types (such as GIS) and generated ordinals (sequences/auto-increment), but at least SQL queries across all RDBMS are reasonably consistent and allow for good portability. Moving from one NoSQL DB to another is a massive undertaking. Not only the store, but your code will need to change to accommodate new syntax. Generally, if you change the underlying storage engine for an RDBMS, no code changes are required unless you go for a less featured store, such as ACID compliant store to ISAM. I am aware this is an edge case for RDBMS, but with NoSQL in its infancy such changes will be frequent.
4. You equate horizontal scaling to sharding. OUCH! Sharding should be a last resort. For both RDBMS and NoSQL you can use other techniques (such as distributed database mapping) to put functionally different parts of the database on different instances.
5. With all database engine, physical memory (RAM) means better performance. However, whilst RDBMS can survive with quite a modest memory footprint (64MB will work slower but it will still work) NoSQL databases expect large amounts of memory for in memory indexes. One of key reasons NoSQL is so fast is because large amounts of data are instantly accessible in RAM. Take away the memory and NoSQL performance suffers more than RDBMS. I am sure other posters will be able to recount tales of MongoDB running in 4bits of memory, but in reality real-world performance is poor for large datasets with memory below 2GB on NoSQL. RDBMS’s, by their nature also perform better with more memory, but the decrease in performance between 96GB and 2GB for RDBMS is not as drastic as for NoSQL. By real world you need to create stores over 4TB – go on, give it go for yourself.
6. Which brings us to the problem of small databases. The system requirements for small database should be small. However, if you try to set up one of the current crop of NoSQL DBs you fine they quickly are more trouble than they are worth. RDBMS is the same if you go for any of the mainstream engines. If you really have small data sets that will remain as small datasets then use something more appropriate such as SQLite, or files with filenames structured as keys. MongoDB is the only mature NoSQL DB that can be embedded ‘easily’, but it is so ugly and has horrendous licensing implications.
7. RDBMS is bulky: Yes, they definitely can be! I have a postcode table of 1.74M rows that takes only 300MB of storage, but the index is 3.2GB with reverse index – that’s a big ouch. In this case NoSQL provides for smaller indexes. This could be an example of NoSQL scaling better, but only for a single use case.

I personally ran hybrid environments to allow key/value storage and rapid search engine requirements on one hand, and OLAP functionality on the other. Once I remembered about hstore in PostgreSQL, I dropped Riak for now. The overhead of two systems both from a development/maintenance and system resources point of view was not justified – but I keep watching for the tipping point. The next stage will be to generate a pre-aggregated dataset using some kind of non-RDBMS to increase search performance but keep the ACID compliant RDBMS in the background to ensure full data integrity. This approach will work for my application (where a NoSQL pre-aggregated snapshot can be updated asynchronously), but it won’t for all – YMMV. In particular, to maintain OLAP functionality the pre-aggregated data will be much larger than that in the RDBMS which would not require denormalisation to provide the same functionality. However, the NoSQL denormalised pre-aggregated data *should* be faster to interrogate.

Thanks for the comment and spending the time with your response. I really appreciate the feedback.

I loved that you brought up the fact of SQL being consistent across all platforms and your point on RAM is absolutely true. To my bad I did not include PostgreSQL, Maria/MySQL, etc.

I AM curious on what you meant by licensing issues with NoSQL (you mention Mongo in particular) as I have had not heard of any issues. It *is* licensed AGPL but to my knowledge this isn’t an issue unless you are looking at making derivitive works (such as being a cloud provider of Mongo for example). Would love to read any sources you might have on this issue. Feel free to post and/or e-mail me.

In any case. Other than the fact I left out some popular RDBMS I feel like my post still stands when you’re talking about small to medium sized installations. Admittedly it was rather “back-of-the-napkin” so I should also clarify that I am not making a point-to-point comparison of NoSQL/RDBMS with my bullets. They are simply things I like and don’t like. They do not necessarily have a counterpoint with the other technology.

My intent with this post was to explain why I might choose a RDBMS or NoSQL for a given solution. The real point is to get the wheels turning in understanding the implications for the application at hand.

I find the whole database revolution going on right now fascinating quite frankly.

My hope is that people find yours/my comments and do the research for what makes sense for them.