Cassandra vs. MongoDB

Cassandra vs. MongoDB

Are you considering Cassandra or MongoDB as the data store for your next project? Would you like to compare the two databases? Cassandra and MongoDB are both “NoSQL” databases, but the reality is that they are very different. They have very different strengths and value propositions – so any comparison has to be a nuanced one. Let’s start with initial requirements… Neither of these databases replaces RDBMS, nor are they “ACID” databases. So If you have a transactional workload where normalization and consistency are the primary requirements, neither of these databases will work for you. You are better off sticking with traditional relational databases like MySQL, PostGres, Oracle etc. Now that we have relational databases out of the way, let’s consider the major differences between Cassandra and MongoDB that will help you make the decision. In this post, I am not going to discuss specific features but will point out some high-level strategic differences to help you make your choice.

1. Expressive Object Model

MongoDB supports a rich and expressive object model. Objects can have properties and objects can be nested in one another (for multiple levels). This model is very “object-oriented” and can easily represent any object structure in your domain. You can also index the property of any object at any level of the hierarchy – this is strikingly powerful! Cassandra, on the other hand, offers a fairly traditional table structure with rows and columns. Data is more structured and each column has a specific type which can be specified during creation.

Verdict: If your problem domain needs a rich data model then MongoDB is a better fit for you.

2. Secondary Indexes

Secondary indexes are a first-class construct in MongoDB. This makes it easy to index any property of an object stored in MongoDB even if it is nested. This makes it really easy to query based on these secondary indexes. Cassandra has only cursory support for secondary indexes. Secondary indexes are also limited to single columns and equality comparisons. If you are mostly going to be querying by the primary key then Cassandra will work well for you.

Verdict: If your application needs secondary indexes and needs flexibility in the query model then MongoDB is a better fit for you.

3. High Availability

MongoDB supports a “single master” model. This means you have a master node and a number of slave nodes. In case the master goes down, one of the slaves is elected as master. This process happens automatically but it takes time, usually 10-40 seconds. During this time of new leader election, your replica set is down and cannot take writes. This works for most applications but ultimately depends on your needs. Cassandra supports a “multiple master” model. The loss of a single node does not affect the ability of the cluster to take writes – so you can achieve 100% uptime for writes.

Verdict: If you need 100% uptime Cassandra is a better fit for you.

4. Write Scalability

MongoDB with its “single master” model can take writes only on the primary. The secondary servers can only be used for reads. So essentially if you have three node replica set, only the master is taking writes and the other two nodes are only used for reads. This greatly limits write scalability. You can deploy multiple shards but essentially only 1/3 of your data nodes can take writes. Cassandra with its “multiple master” model can take writes on any server. Essentially your write scalability is limited by the number of servers you have in the cluster. The more servers you have in the cluster, the better it will scale.

Verdict: If write scalability is your thing, Cassandra is a better fit for you.

5. Query Language Support

Cassandra supports the CQL query language which is very similar to SQL. If you already have a team of data analysts they will be able to port over a majority of their SQL skills which is very important to large organizations. However CQL is not full blown ANSI SQL – It has several limitations (No join support, no OR clauses) etc. MongoDB at this point has no support for a query language. The queries are structured as JSON fragments.

Verdict: If you need query language support, Cassandra is the better fit for you.

6. Performance Benchmarks

Let’s talk performance. At this point, you are probably expecting a performance benchmark comparison of the databases. I have deliberately not included performance benchmarks in the comparison. In any comparison, we have to make sure we are making an apples-to-apples comparison.

1. Database model - The database model/schema of the application being tested makes a big difference. Some schemas are well suited for MongoDB and some are well suited for Cassandra. So when comparing databases it is important to use a model that works reasonably well for both databases.
2. Load characteristics – The characteristics of the benchmark load are very important. E.g. In write-heavy benchmarks, I would expect Cassandra to smoke MongoDB. However, in read-heavy benchmarks, MongoDB and Cassandra should be similar in performance.
3. Consistency requirements - This is a tricky one. You need to make sure that the read/write consistency requirements specified are identical in both databases and not biased towards one participant. Very often in a number of the ‘Marketing’ benchmarks, the knobs are tuned to disadvantage the other side. So, pay close attention to the consistency settings.

One last thing to keep in mind is that the benchmark load may or may not reflect the performance of your application. So in order for benchmarks to be useful, it is very important to find a benchmark load that reflects the performance characteristics of your application. Here are some benchmarks you might want to look at:
- NoSQL Performance Benchmarks- Cassandra vs. MongoDB vs. Couchbase vs. HBase

7. Ease of Use

If you had asked this question a couple of years ago MongoDB would be the hands-down winner. It’s a fairly simple task to get MongoDB up and running. In the last couple of years, however, Cassandra has made great strides in this aspect of the product. With the adoption of CQL as the primary interface for Cassandra, it has taken this a step further – they have made it very simple for legions of SQL programmers to use Cassandra very easily.

Verdict: Both are fairly easy to use and ramp up.

8. Native Aggregation

MongoDB has a built-in Aggregation framework to run an ETL pipeline to transform the data stored in the database. This is great for small to medium jobs but as your data processing needs become more complicated the aggregation framework becomes difficult to debug. Cassandra does not have a built-in aggregation framework. External tools like Hadoop, Spark are used for this.

9. Schema-less Models

In MongoDB, you can choose to not enforce any schema on your documents. While this was the default in prior versions in the newer version you have the option to enforce a schema for your documents. Each document in MongoDB can be a different structure and it is up to your application to interpret the data. While this is not relevant to most applications, in some cases the extra flexibility is important. Cassandra in the newer versions (with CQL as the default language) provides static typing. You need to define the type of very column upfront.

Hi. I was evaluating both DBs recently as well, and came up with more or less same conclusion as you did. The only point I disagree on is 5. Even if Cassandra has CQL it is fairly limited and cannot be really compared with SQL. For example there is no even OR operator.

http://degoes.net/ John A. De Goes

In addition, the Quasar open source project brings powerful SQL to MongoDB, and it’s leveraged by SlamData (among other applications).

https://scalegrid.io Dharshan

Fair point Alex. Post has been updated to note the limitations of CQL.

Nisar Adappadathil

Cassandra has restricted its query to a partition.So using OR operator has to query in different partitions which is not recommended in cassandra. While doing data modeling you have to partition your data so that querying is more efficient.

Kelly Stirman

Disclosure – I work for MongoDB. Here are some observations:

#2. If you are mostly querying on primary key, then *any* database will work for you.

#3. In MongoDB 3.2 and later, failures are detected and a new leader elected in under 2 seconds. The trade off for multi-master is that reads are slower and scale less effectively because the client must read from multiple nodes to ensure consistency.

#4. In MongoDB , you can stripe primaries and secondaries across all nodes so that all nodes are capable of serving reads and writes. The trade off is capacity in the event of a failure. The same trade off exists for Cassandra.

#5. MongoDB has a query language – MongoDB Query Language(https://docs.mongodb.com/manual/tutorial/query-documents/). I think your point is specific to SQL. MongoDB provides a Connector for BI that supports ANSI SQL, whereas Cassandra’s CQL is a variant of SQL so existing tools are not compatible.

#7. I still think MongoDB has a massive advantage in terms of ease of use. The Cassandra data model – while based on tables – is very different from an RDBMS. For example, there are no joins, and no secondary indexes. Both products require new skills in terms of modeling data. For MongoDB documents mostly look like the objects in your code, which is pretty natural and easy to understand. There are also far more drivers and frameworks compatible with MongoDB, as well as a wider range of tools that support the database.

#8. Both MongoDB and Cassandra work with Spark and Hadoop. These are heavy-weight tools with their own resources, skills, dependencies, security concerns, and other factors to consider. You can go very, very far with MongoDB’s aggregation framework while staying within the MongoDB ecosystem. There is no options to do this in Cassandra.

#9. Your chart is cut off.

https://scalegrid.io Dharshan

Thanks Kelly. In your answer section #3 is 2 seconds the average/best/worst case time? It will be good to share more information on this.

SR-71

2 seconds is a killer. That would be over 8,000 lost writes in systems I have built in the past.

http://blog.kedare.net Mathieu Poussin

Your system should be build to buffer and allow a failover time in this case ?

SR-71

Buffer? Nah, dead letter queue. Buffers create all sorts of headaches like duplicates and single points of failure.

Nikhil Nanjappa

I would love to use mongo if thats the case, but do we have any fact-sheet on the 2 seconds downtime ?

Could you please touch upon maintenance and cost effectiveness aspect for different sizes and on-premise v/s cloud aws hosted solutions?

Danielle

Thanks for the feedback Raul, I will pass this request on to our devs to see how we can work this idea into the content!

ldmtwo

Do you have a link to where I can find the rest of the cut-off chart? Thanks. I would really like to understand the differences more clearly. Cassandra has very high throughput (confirmed by my team), but seems to be relatively weak on ease of use. The O’Reilly Data Science Salary Survey 2016 shows 10% usage (of survey correspondents) for MDB, but only 4% for Cassandra.

Thanks for your reply. I am looking forward for the cost metrics analysis for both (MongoDB and Cassandra) NoSQL Databases based on Sample Data. How can I recommend which is best database based on the performance and cost analysis and in which databases deployed on premises. Right now not going for cloud. I have seen the cost metrics in AWS. But I am just trying to figure out how can I design cost analysis. Consideration (throughput Efficiency, Nodes, Clusters, CPU performance and everything hardware prespective also) . Thanks for reading my comment.

Carlos El Sueco

very easy to understand and objective comprison. Great thanks to you for this valuable intro.

Aakash Sharma

what about multitenancy support. Also, if one needs to keep master data information such as customer data, affiliates etc. along with analytics information such as hits, pageviews etc. would it be advisable to keep Cassandra as the only database. Having two database will create additional layer to replicate information. But, keeping master data in cassandra will not allow for complex queries filters for search by name, joining dates etc. Any thoughts?

Vinod Jayakumar

This was pretty helpful. thanks for putting this out neatly.

Anshul katta

shut the fuck up , i still use mysql , works better than these kiddy dbs

peridotventures

please stop saying cassandra doesn’t have secondary indices or aggregations. It has both, it didn’t used to but does now. Cassandra is opinionated it requires that you think about your query cases, typing, and partitioning up front. You can decide whether that is a good design principle or not, I for one think it is. My understanding of Mongo is that is lets you defer thinking hard about these things until you have to, because stuff isn’t scaling anymore, at which point the problem is harder, but probably you are so successful that you can afford to re-engineer a bit. Mongo is schemaless to a degree, Cassandra is more strongly typed. The expressive object model can be done in either, I think that point is outdated or just wrong. With collections, maps, and user defined types you can make very rich object models in cassandra, with type safety. What I’ve seen with many mongo designs is that people push type checking and validation into the API layer, which is fine, but I kinda laugh about it, because are you really schemaless now ? It’s very similar to how people hate compilers and then end up writing a bazillion tests to replace what the compiler did for free. Feels like people just move the problem around and reinvent things a new way, but I digress. They are both useful tools depends on your use cases.

faris rayhan

Two points that help me very much is High Availability and Write Scalability. It is very contrast with another database feature. Cassandra is good for transaction