NoSQL Ecosystem

Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as “NoSQL databases.”

Businesses, including The Rackspace Cloud, need to find new ways to store and scale large amounts of data. I recently wrote a post on Cassandra, a non-relational database we have committed resources to. There are other non-relational databases being worked on and collectively, we call this the “NoSQL movement.”

The “NoSQL” term was actually coined by a fellow Racker, Eric Evans when Johan Oskarsson of Last.fm wanted to organize an event to discuss open source distributed databases. The name and concept both caught on.

Some people object to the NoSQL term because it sounds like we’re defining ourselves based on what we aren’t doing rather than what we are. That’s true, to a degree, but the term is still valuable because when a relational database is the only tool you know, every problem looks like a thumb. NoSQL is making people aware that there are other options out there. But we’re not anti-relational-database for when that really is the best tool for the job; it’s “Not Only SQL,” rather than “No SQL at all.”

One real concern with the NoSQL name is that it’s such a big tent that there is room for very different designs. If this is not made clear when discussing the various products, it results in confusion. So I’d like to suggest three axes along which to think about the many database options: scalability, data and query model, and persistence design.

I have chosen 10 NoSQL databases as examples. This is not an exhaustive list, but the concepts discussed are crucial for evaluating others as well.

Scalability

Scaling reads is easy with replication, so when we’re talking about scaling in this context, we mean scaling writes by automatically partitioning data across multiple machines. We call systems that do this “distributed databases.” These include Cassandra, HBase, Riak, Scalaris, Voldemort, and more. If your write volume or data size is more than one machine can handle then these are your only options if you don’t want to manage partitioning manually. (You don’t.)

There are two things to look for in a distributed database: 1) support for multiple datacenters and 2) the ability to add new machines to a live cluster transparently to your applications.

Non-distributed NoSQL databases include CouchDB, MongoDB, Neo4j, Redis, and Tokyo Cabinet. These can serve as persistence layers for distributed systems; MongoDB provides limited support for sharding, as does a separate Lounge project for CouchDB, and Tokyo Cabinet can be used as a Voldemort storage engine.

Data and Query Model

There is a lot of variety in the data models and query APIs in NoSQL databases.

The columnfamily model shared by Cassandra and HBase is inspired by the one described by Google’s Bigtable paper, section 2. (Cassandra drops historical versions, and adds supercolumns.) In both systems, you have rows and columns like you are used to seeing, but the rows are sparse: each row can have as many or as few columns as desired, and columns do not need to be defined ahead of time.

Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying those more efficiently than simply returning the entire blob each time.

Neo4J has a really unique data model, storing objects and relationships as nodes and edges in a graph. For queries that fit this model (e.g., hierarchical data) they can be 1000s of times faster than alternatives.

Scalaris is unique in offering distributed transactions across multiple keys. (Discussing the trade-offs between consistency and availability is beyond the scope of this post, but that is another aspect to keep in mind when evaluating distributed systems.)

Persistence Design

By persistence design I mean, “how is data stored internally?”

The persistence model tells us a lot about what kind of workloads these databases will be good at.

In-memory databases are very, very fast (Redis achieves over 100,000 operations per second on a single machine), but cannot work with data sets that exceed available RAM. Durability (retaining data even if a server crashes or loses power) can also be a problem; the amount of data you can expect to lose between flushes (copying the data to disk) is potentially large. Scalaris, the other in-memory database on our list, tackles the durability problem with replication, but since it does not support multiple data centers your data will be still be vulnerable to things like power failures.

Memtables and SSTables buffer writes in memory (a “memtable”) after writing to an append-only commit log for durability. When enough writes have been accepted, the memtable is sorted and written to disk all at once as a “sstable.” This provides close to in-memory performance since no seeks are involved, while avoiding the durability problems of purely in-memory approaches. (This is described in more detail in sections 5.3 and 5.4 of the previously-referenced Bigtable paper, as well as in The log-structured merge-tree.)

B-Trees have been used in databases since practically the beginning of time. They provide robust indexing support, but performance is poor on rotational disks (which are still by far the most cost-effective) because of the multiple seeks involved in reading or writing anything.

An interesting variant is CouchDB’s append only B-Trees, which avoids the overhead of seeks at the cost of limiting CouchDB to one write at a time.

Conclusion

The NoSQL movement has exploded in 2009 as an increasing number of businesses wrestle with large data volumes. The Rackspace Cloud is pleased to have played an early role in the NoSQL movement, and continues to commit resources to Cassandra and support events like NoSQL East.

About the Author

I thought Scalaris definitely handled the adding of machines live. There is certainly code written for it, and the papers written on Scalaris talk about nodes joining and automatic load balancing (I’ve never used Scalaris myself, so there may be things I’m unaware of).

Also, I don’t think describing Scalaris as an in-memory database in the traditional sense hits the mark. Scalaris has a fail-stop model where availability is determined by the number of replicas; if a majority of the replicas are on-line, the data is accessible. This implies >2 replicas per item. Avaliable memory is per-cluster, and can also include disk space (one storage option is disk-based tokyo cabinet), but in this case even the disk space is regarded as volatile. This of course makes the ‘very very fast’ claim a bit relative when applied to Scalaris. It offers performance through massive scalability – not necessarily per node, and one can imagine that latency can be substantial sometimes, esp. for writes.

Lastly, I am not sure why Scalaris would be limited to one data center, given the above. Could you expand on that?

By Multi-DC support I mean more than the naive “it works if you put all the machines on a vlan” approach: it needs to be able to route requests from clients to replicas in the same DC (or, preferably, same rack), and ideally it should allow waiting only for writes to nodes in the same DC as well. I couldn’t find anything indicating Scalaris does either, but I could have missed something.

You’re also right about optionally using TC (I’m trying for “broad strokes” in this article :), although I would note that TC’s performance itself falls off a cliff when it grows out of RAM, so I’m skeptical as to how useful that is in practice.

Check out TSnosql on twistedstorage.sourceforge.net – it’s a redis like server written in Python.

http://solimap.com soli

mongodb is awesome ! very fast and really easy to use it within Rails

Brian Hammond

tsnosql looks like a rebranding of redis. are you serious?

http://twistedstorage.com Chuck Wegrzyn

It isn’t redis but has the same interface and written in Python making it portable across all OS platforms.

http://twistedstorage.com Chuck Wegrzyn

Also TSnosql is like Cassandra in ‘adding machines live’ and multi-DC support.

Pingback: story of a sysadmin » Overview of the NoSQL Ecosystem()

J. Chris Anderson

I know we often disagree about definitions (eg “distributed”), but I’m fairly certain CouchDB should have a checkbox for multi-datacenter support. Not in the Hadoop-style sense of rack awareness, but in the sense that lots of people are using CouchDB replication to provide multi-datacenter redundancy for their applications. And there are even more folks who are using CouchDB in a single datacenter, knowing that when it’s time to get more geographically distributed, their data-layer won’t need changing.

anirudh

I thought the very reason that B-trees are used in relational databases is so that the number of seeks are reduced as B-trees are essentially *flat* trees.

IBM User

Have you considered IBM’s IMS? It’s been around long before relational databases existed, and it handles terabytes of banking records with good performance. It’s not open source, but it’s still what many businesses use for data scalability.

I agree: RDF triple stores are faster than SQL, more intuitive, and can exploit standard ontologies (vice private schemas).

I am a happy user of Allegro Store which is ripping fast.

http://www.hyperswitching.com Joris

AFAIK tokyo cabinet supports multiple types of databases. It also allows you to query for data instead of doing get/puts

Eve Jekyll

Hi,

what about backing up a NoSQL database?
Please don’t tell me that backups are not required e.g., because the database is replicated.

Cheers,
EJ

http://twistedstorage.com Chuck Wegrzyn

In the case of TSnosql (and redis) you can issue a command that will back it up to disk. On restart the servers will re-read the data stored in the file.

dg

Meh. Either go old (IMS) or go new (FusionIO), but don’t waste time re-inventing the wheel.

http://www.mgateway.com rob

Old means tried and tested, not old-fashioned, when it comes to NoSQL. GT.M should be added to your list of NoSQL databases. See http://bit.ly/2fS2jy

mainframe truth

How ironic that when high volume transaction work is honestly explored, an old idea comes back to light. I wonder if anyone has looked at ACF, TPF or zTPF, particularly ACFDB/TPFDB. This is IBM’s high volume transaction processing operating system and database component used in airline reservation systems and more. The database component is most assuredly a NoSQL database.

anthony

Let’s not forget Model 204.

http://www.legeek.net sylware

Proud member of the nosql movement.
I tried to connect my data stores to a MySQL engine… storage plugin API is C++… erk!!

Kjetil Valstadsve

This might be as good a place to ask as any…

I’ve heard some people complain that EC2 (Amazon) is a sub-optimal environment for Cassandra, because of the I/O characteristics of their virtual machines. Anyone here with experiences to share?

http://www.katkovonline.com Igor

Yes, I/O sucks, even with EBS (your best choice) cassandra was 2-4 times slower than running on dedicated hardware
Of course it all depends on HDD characteristics so you mileage might vary.
Although crucial, I/O is not the only thing you should be worried about – it’s don’t let your nodes GC or become un-responsive in any other way. It’s like domino effect.

david nicol

is anyone working on decoupling extent management from higher levels of abstraction? That is, treating “extent” as the basic unit of persistence instead of “sector” or “spindle” or “block”, and re-engineering up from that instead of from SCSI

Working with MySQL Cluster, I would have loved to have it part of the comparison. MySQL Cluster is a distributed hash-table based on a relational model, we support both in-memory data (with (optional) checkpointing/logging) and (since quite recently) data stored on disk.
We have a native api, with which can give very good throughput (http://jonasoreland.blogspot.com/2008/11/950k-reads-per-second-on-1-datanode.html) and control, and is also a storage engine for MySQL to provide sql-access. And, we support multiple data-centers using ordinary MySQL replication, and online adding of nodes (providing full transaction semantics during the adding)

End of advertising
/Jonas

Craig Bennett

Actually there is a long history of key/value or document store databases starting with Pick in the 70s. They are not opensource but they are current products supported and maintained with rich query languages and a range of APIs. Have a look at UniVerse, UniData or other multi-value databases.

stacy

I believe the point here is OpenSource. Being able to fiddle with the code and or have other developers in the wild fiddle/fix code is very important in today’s world. It might benefit some of these mature applications if they joined the movement.

http://www.netsight.co.uk Matt Hamilton

We’ve been using the ZODB for all our web projects for the past 10 years now. Its a object oriented, transactional, clusterable, etc etc.

Its strangely absent from this list, seeing as it is one of the oldest non-sql databases in general use.

-Matt

Daniel D

So.. our data in the future will essentially just be one, gigantic spreadsheet?