Sunday, August 07, 2011

Although its been a few years in the making, the noise / buzz around NoSQL has now reached fever pitch. Or to be more precise, the promise of something better / faster / cheaper / more scalable than standard RDBMSs has sucked in a lot of people (plus getting to use MapReduce in an application even if it's not needed is a temptation very hard to resist..). And pretty recently, the persistence hydra has grown another head - NewSQL. NewSQL adherents essentially believe that NoSQL is a design pig and that a better approach is to fix relational databases. In turn, NewSQL claims have been open to counter-claim on the constraints inherent in the NewSQL approach. It's all very fascinating (props for working Lady Gaga into a technical article as well..).

As it turns out, traditional RDBMSs are sometimes slow for valid reasons, and while you can certainly speed things up by relaxing constraints or optimising heavily for a specific use case, that's not a panacea or global solution to the problem of a generic, fast way to store and access structured data. On the other hand, the assertion that Oracle, MySQL and SQL Server have become fat and inefficient because of backwards compatibility requirements definitely strikes a chord with me personally.

The sheer variety of NoSQL candidates (this web page lists ~122!) is evidence that the space is still immature. I don't have a problem with that (every technology goes through the same cycle), but it does raise one nasty problem: what happens if you back the wrong candidate now in 2012 that has disappeared in 2015?

The current NoSQL marketplace demands a defensive architecture approach - it's reasonable to expect that over the next three years some promising current candidates will lose momentum and support, others will merge and still others will be bought up by a commercial RDBMS vendor, and become quite costly to license.

What we need is a good, implementation-independent abstraction layer to model the reading and writing from and to a NoSQL store. No hard coding of specific implementation details into multiple layers of your application - instead segregate that reading and writing code into a layer that is written with change in mind - we're talking about pluggable modules, sensible use of interfaces and design patterns to make the replacement of your current NoSQL squeeze as low-pain as possible if and when that replacement is ever needed.

If the future shows that the current trade-offs made in the NoSQL space (roughly summed up as - a weaker take on A(tomicity),C(onsistency), I(solation) or D(urability), plus with your own favourite blend of Brewer's CAP theorem) are rendered unnecessary by software and hardware advances (as is very likely to be the case), then the API should ideally insulate our application code from this change.

There are interesting moves afoot that demonstrate that the community is actively thinking about this, specifically the very recent announcement ) of UnQL (the NoSQL equivalent to SQL - i.e. a unified NoSQL Query Language). That's good, but UnQL is young enough to shrivel and die just like any of the NoSQL implementations themselves. Also, we know that what has inspired UnQL - SQL - is itself fragmented / with vendor-specific extensions like T-SQL from Microsoft and PL/SQL from Oracle.

So then, in part one of this two-parter, I've worked to justify what's coming in part two - a minimal set of Java classes and interfaces to provide a concrete implementation of the abstract ideas discussed above.