NoSQL: Not Going Anywhere For a While?

Everyone seems to have their own problems with the NoSQL term, so here’s mine: it doesn’t mean anything. Not that terms have to be specific to be useful: is the term database really that descriptive?

But the challenge with NoSQL is that the name implies that it means something, and that’s enough for folks new to the space to form opinions on the matter. For better, maybe, but mostly for worse.

The reason NoSQL exists is simple: the long time assumption that if persistence is the question, a relational database is the answer. As far back as March 2005, I’ve been skeptical of the sustainability of that assumption and expecting increased acceptance of non-relational datastores. The predicted adoption took a wee bit longer than I anticipated, but it’s here now. This outcome was inevitable.

Not because relational databases are inherently flawed and poised to go the way of the dinosaur: they’re going to be around as long as I’m in this business. Adoption was inevitable because, just as in every other walk of life, there are different tools for different jobs in the technology world. Which brings us to the issue at hand: when different jobs refers to any workload where a relational database is a less than ideal solution, your bucket is too big. As we see, daily, from the inquiries.

Lumped into the “NoSQL” bucket right now are tools as diverse as column databases, distributed databases, distributed filesystems, document databases, key value stores and even graph oriented databases. Even those categories blur: what’s the difference between a distributed database and a column database?

Exactly.

What do they have in common? Not a lot. Wikipedia implies that it’s about big data, and indeed some of the NoSQL stores scale remarkably well. But some don’t, intentionally. For all the talk of avoiding joins, eventual consistency, non-ACID compliance and such, the real common denominator for NoSQL stores is that they are generally not row/table oriented and they are mostly SQL ignorant. Except that, as Brian Aker points out, these distinctions may be nothing more than semantics. And as if that wasn’t complicated enough, SQL-like features are periodically being reintroduced via projects such as Pig.

Whatever your feelings on whether or not NoSQL is actually about SQL – Michael Stonebraker certainly doesn’t think so – defining an entire category of software by what it doesn’t do rather than what it does seems like a problem.

Which is part of the challenge for projects like Cassandra, CouchDB, Hadoop, HBase, HyperTable, InfiniDB, Memcache, MongoDB, Redis, Riak, Tokyo Cabinet/Tyrant, Voldemort et al. And part of the challenge, frankly, for folks that do what we do.

The good news is that the need for such tools is very real. As we’ve seen with projects like Drizzle, which was forked specifically because the design trajectory was not meeting the needs of a certain class of customer. Flawed though it may be, the NoSQL term is being applied to a real and accelerating pattern of adoption, and we’re seeing spikes in interest across the board. Which is why I anticipate strong, though not mainstream, uptake of the tools in 2010.

I wish we had a better designation than NoSQL, but I know better than to try to push that rock up a hill. Besides, some smart folks are only too happy to leave SQL behind. Looking further out, as we see heavier adoption within individual NoSQL software types, the unhelpful umbrella term may yet be retired, but until it does expect NoSQL to be a trending topic in 2010. Love it or hate it, the term isn’t going anywhere for a while. Even if it doesn’t mean anything.

Disclosure: Basho, a commerical backer of Riak, Cloudera, a commercial backer of Hadoop, IBM a commercial backer of CouchDB, Hadoop and Casssandra are RedMonk customers.

8 comments

Gordon Haffsays:

>The reason NoSQL exists is simple: the long time assumption that if persistence is the question, a relational database is the answer.

And, in fact, for a lot of situations for which persistance, at least long-term persistance, isn’t nedessary. (Which is where memcached comes in.)

I generally agree with your sentiments. On the one hand it’s useful to have an umbrella term that refers to a bunch of stuff that isn’t a relational database. OTOH, NoSQL is getting used to cover very dissimilar techs which I’ve seen cause confusion.

The thing is, SQL is an interface to a database. It implies something about the data structure (rows, columns, NULL works the way SQL says it should, a certain vocabulary of types), but it’s really mainly about a way to express database operations: INSERT, SELECT, DELETE, UPDATE, etc.

There’s no reason why a SELECT statement can’t be implemented against a distributed eventually-consistent column store!

Now, implementing INSERT and UPDATE in such an environment is trickier, mainly because of things like the eventual consistency, updates might not occur in the same order on all replicas, unique-key constraints violations may not be detected until later, etc. However, the *syntax* doesn’t have much to do with that; just the *semantics* of what the application expects. Indeed, MySQL replication is as eventually consistent as any new-fangled NoSQL database; but it still uses SQL.

Also, you can use SQL alongside other interfaces to the same data, and pick the right tool for the right part of each application.

What we have done at GenieDB, therefore, is to write our nice replicated key-value store with its native API, then write a MySQL storage engine that backs into it. The resulting update semantics are pretty close to normal local tables, and we document the differences. And so people can access the same tables through SQL (subject to the restriction of a fixed schema) or they can go through the native API (for lower latency, and the ability to have different fields in different records of the same table, etc).