Big data showdown: Cassandra vs. HBase

HBase has also introduced "coprocessors," which allow the execution of user code in the context of the HBase processes. The result is roughly comparable to the relational database world's triggers and stored procedures. Cassandra currently has no counterpart to HBase's coprocessors.

Cassandra's documentation is noticeably better than HBase's, and good documentation certainly flattens the learning curve. In my experience, setting up a development Cassandra cluster is simpler than setting up an HBase cluster. Of course, this is only important for development and testing purposes.

An HBase master node hosts a Web interface on port 60010. Here you can browse information such as the node's execution history, tables managed by the node, and region servers in the master's domain.

The win columnThe real work appears when you must tune a cluster for your particular application. Given the size of the data sets involved and the complexity of building and managing a multinode cluster (that often spans multiple data centers), tuning is hardly straightforward. It demands a solid understanding of the interplay of the cluster's memory caching, disk storage, and internode communications, and it requires careful monitoring of cluster behavior.

It's true that HBase's reliance on Zookeeper — a separate application — introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids. But it isn't the case that tuning a Cassandra cluster is orders of magnitude less difficult. In the end, comparing the travails of cluster tuning of both databases, it's probably a wash.

Which means, as usual, there is no clear winner or loser. You'll find zealots for both databases, and each camp will present compelling evidence demonstrating the superiority of their system. And as usual, you'll face the chore of taking each for a test drive and benchmarking them against your target application. But given the scope of these technologies, could there be any other way?

Symmetric architecture makes it relatively easy to create and scale large clusters