A common thread in technology — or for that matter, just about anything humans attempt over time — is evolution. It's not unusual, then, to see things grow and adapt as the world around them changes. That very adaptation means that sometimes older variations of our favorite tools remain, while newer variations appear alongside them to better serve certain, specific purposes.

The database world is no exception. Since the days of E.F. Codd and his introduction of the relational data model, we've seen data stores and their requirements grow from purely transactional models doing ACID (Atomic, Consistent, Isolated, Durable), to requiring more flexibility, ability to scale (via distributed deployment), and speed (in terms of latency and throughput). These improvements may, in many cases, be at the very expense of what made RDBMS solutions preferable in the first place. But how to know?

In this piece, we'll refer to traditional SQL databases — relational database systems — as RDBMSs. Their counterpart, in our comparison, is the category of databases — known as NoSQL — which includes many subtypes.

Overview of SQL/RDBMS

RDBMSs have been the default choice for information storage in financial, manufacturing, logistical, personnel, and other systems since the 1980's. Since Oracle's first commercial offering, by what was then known as Relational Software, was introduced in 1979, the world saw a phasing out of their predecessors — legacy hierarchical and network databases — partly because of ease of use and administration. Examples of RDBMS databases available today include Oracle, MySQL, Microsoft SQL Server, and PostgreSQL.

RDBMSs improve on their predecessors via a few other attributes that make them indispensable: support for the ACID transaction, and the facilitation of referential integrity.

The fault-tolerance, accuracy, and trustworthiness born of these attributes have made RDBMSs the go-to for transactional data. However, with the advent of the web in the 90's and "Big Data" in the mid-aughts, new requirements appeared.

To deal with the exponentially-growing, semi-structured flood of data that was, by then, becoming commonplace, admins asked questions. Could RDBMSs be distributed, or were they capacity-limited to the hardware they resided on? How did distribution affect throughput? Could RDBMSs support multi-tenancy, and did this threaten transaction isolation? Could relational systems support very wide columns, variable-length records, semi- or even unstructured data?

Often, the next evolutionary step is to tear down your existing assumptions and start over, with your new requirements in mind. Enter NoSQL.

Overview of NoSQL

Actually, a few "NoSQL" solutions had been around a while already, but hadn't gained much traction outside of specific domains. As early as 1966, the unfortunately-named MUMPS programming language supported a key-value database (with ACID transactions) for, ironically, the healthcare industry. Another early key-value store (for Unix) was Ken Thompson's dbm (of which a current successor is BerkeleyDB).

NoSQL is an umbrella term for any database that stores data in a way other than RDBMS's rigidly-typed, schema-immutable, relational tables. Today, there are several common types of NoSQL databases, best suited to different purposes. Among them:

Document-oriented databases: databases that, like key-value databases, use a key to access an associated value; in a document-oriented database however, the data in the value is not opaque to the system and may be used to optimize the query. Document-oriented databases may store their data in a notation like JSON or XML.

Wide-Column stores: databases that use tables, rows, and columns, like an RDBMS, but where names and formats of columns can vary by row within the same table.

Time-Series databases: database optimized for storing and manipulating time series data (where timestamps are best treated as discrete quantities rather than continuous mathematical dimensions).

Text Search: these databases let you store data in unstructured format or even as plain text.

It's worth mentioning that many NoSQL databases out there today provide features from multiple NoSQL types. For example, both Apache Cassandra and ScyllaDB are wide-column stores and distributed, tunably-consistent databases.

Now that many NoSQL databases achieve their (com)promises, many, if not most, are going back to an SQL-like dialect, to grow their user base with users already familiar with SQL. CQL (Cassandra Query Language), Spark, Hive, or Presto are all notable examples. Elasticsearch has its own SQL variant. So does MongoDB. Even streaming platforms like Kafka are getting in on the SQL act!

Key Differences Between NoSQL and SQL/RDBMS

In addition to these SQL (-like) query languages, NoSQL and RDBMS databases share at least one more thing in common: the ability to store data. The difference is in how.

NoSQL databases make certain compromises to get a leg up on flexibility, ingestion speed and query throughput. As noted earlier, for example, a tunably-consistent database offers users an adjustable tradeoff on consistency if higher availability or partition tolerance is preferred.

Also, NoSQL databases may be missing or may compromise some of the features inherent to RDBMS, for example, primary keys, foreign keys, rigid schemas, and (hardcoded) data types.

Here's a look at some of the key differences between the RDBMS and NoSQL categories, generally:

NoSQL

SQL / RDBMS

Non-relational

Relational

Many types are not table-based, if they are, schemas are not usually fixed

Fixed schema, table-based

Horizontally-scalable

Vertically-scalable

If distributed, follows CAP theorem (user can adjust the tradeoff between consistency, availability, and partition-tolerance.)

Which Is Best, When?

If you could answer the question "Why NoSQL?" with only a few words, they would probably boil down to scale (including flexibility of scale), flexibility (including of schema and type constraints), or latencyandperformance (including throughput). Secondary considerations may include the size of the community (and the support and documentation you're likely to find) or the level of industry adoption (as a measure of popularity, and thus reliability and fitness for purpose) of a given solution.

Of course, there's more to it than just that. While NoSQL offerings are built around these concepts, RDBMS databases remain the preferred choice in industries and applications involving transactional data, where ACID-based durability is essential: user authentication and access-rights management, healthcare, banking, resource allocation and inventory management, and shopping carts, to name but a few. RDBMS databases come with the long-established promises of isolation, security, and referential integrity, and there's no seeing those requirements go away any time soon.

However, in applications involving high-volume, fast-moving analytics data from sensors, applications, or complementary systems, a NoSQL solution with different features or attributes might be a better fit. What kind of needs do you have? A fast-lookup with caching features, like a key-value store? A more flexible document-store for searching through larger records, or even text search for unstructured data? Maybe a distributed wide-column store with tunable consistency, or a store specialized for time-series data?

NoSQL is an evolution from RDBMS databases to better serve specific needs. The "best" in any case is what suits your needs; remember to assess your existing infrastructure and team to find areas where gaps can become strengths.