Add comparison

Likelihood to Recommend

Apache Kafka

Apache Kafka is extremely well suited in near real-time scenarios, high volume or multi-location projects. It can solve escalation problems for a fraction of the cost other solutions do and it has the flexibility of open source scenarios.

Cassandra

Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TKIt is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice.

As a Java based NoSQL database it has the greatest community and adoption. Coupled with great Apache hadoop, Apache Spark and Solr integration and a strong tools ecosystem(unit tests, stress testing), it is a unbeatable combination!

As a hybrid architecture based on masterless architecture as in DynamoDB and column family data model as in BigTable, it hits the bulls eye!

It has best in class performance across different kinds of read/write/mixed workloads. It provides linear scalability which works for the best performance, lowest latency and highest throughput.

Being a tunable consistency model enables you to have consistency as your platform/application needs.

If configured correctly, there is no downtime and no data loss.These are key criterias on critical domains.

Apache Cassandra is lacking in some features, which Datastax provides in the Enterprise version. For example, security and advanced tools like OpsCenter. These would be a great addition to open source Apache Cassandra.

At times we noticed some versions had issues not known in advance, for example, LostNotificationError on repair of nodes. However steadily the newer releases have become better and more stable.

Examples of datastax native driver with Cassandra 2.1 can be improved, as it does not provide all scenarios one would need on production.

If you prefer to work with an open source project and be hands on, Apache Cassandra is one of the best. However if you need a managed cassandra like service where you do not even want to configure/deploy/backup/restack, a DynamoDB service would be more preferred.

Cassandra is JVM based NoSQL, hence garbage collector tuning is a key aspect, Garbage collection in JDK 8 and G1GC garbage collector is better or configure ConcurrentMarkSweep(CMS) garbage collector in an optimum manner.

In our POC Cassandra satisfies all our needs and expectations. We would like to do an additional POC to test its cross-continent cluster level replication features, measuring the performance and data consistency level to help us finally decide how to move to production.

Alternatives Considered

Kafka is faster and more scalable, also "free" as opensource (albeit we deploy using a commercial distribution). Infrastructure tends to be cheaper. On the other hand, projects must adapt to Kafka APIs that sometimes change and BAU increases until a major 1.x version comes out and adds stability to the product.