MongoDB vs. Clustrix Comparison, Part 1: Performance

MongoDB versus ClustrixDB

*Note: You can now code. As mentioned in my post, I populated both databases with 300M rows. I played around with the best client/host/thread ratio for each database to achieve peak throughput. I used MongoDB v1.6.5.

For the MongoDB read tests, I used the read-test.cpp harness. I didn’t have time to do a proper getopt parsing for it, so I would modify it by hand for test runs. But it’s very straight forward.

The C version of the MySQL/Clustrix harness is not included because I used a Clustrix internal test harness framework — to save time. It doesn’t impart Clustrix any advantage in the test, and it relies too much on our infrastructure to be of any value. You can still use the Python based test harness — it just requires a lot of client cpu power.

Introduction

With all the recent buzz about NoSQL and non-relational databases, the marketing folks at Clustrix asked a question: Do we have the right solution for today’s market? It’s a fair question, especially since Clustrix is a fully featured RDBMS with a SQL interface. And we all heard that SQL doesn’t scale, right?

So that brings us around to the next question: What do people actually want out of their database? Surely it’s not simply the absence of a SQL based interface. Because if that’s the case, Berkeley DB would be a lot more popular than say SQLite. Over the years, we’ve had many conversations with people about their database needs. Over and over, the following has always come up a list of must have features for any modern system:

Incrementally scalable performance

High availability and fault tolerance

Ease of overall system management

Interestingly enough, we never heard that SQL or the relational model was the root of all their problems. It appears that the anti-SQL sentiment came around with this sort of false reasoning.

I have a SQL based RDBMS.

I can’t seem to scale my database beyond a single system.

Therefore SQL is the problem.

The NoSQL movement embraced this reasoning. The NoSQL proponents began to promote all sorts of “scalable” systems at the expense of venerable DBMS features like durability. And they kept going. What else don’t we need? Well, we don’t need Consistency! Why? Because that’s really hard to do and keep performance. Slowly but surely, these systems would claim to have a panacea for all of your scalable database needs at the expense of cutting features we’ve come to expect from 40 years of database systems design.

Well, that’s just bullshit. There is absolutely nothing about SQL or the relational model preventing it from scaling out.

Over the next set of posts, I’m going to compare MongoDB and Clustrix using the above evaluation criteria: Scalable Performance, Availability and Fault Tolerance, and Ease of Use. I am going to start with Performance because no one believes that you can grow a relational database to Internet Scale. And to put the results into context, I chose to compare Clustrix to MongoDB because (1) it doesn’t support SQL, (2) it can transparently scale to multiple nodes, and (3) it seems to be the new poster child for NoSQL.

Performance

Conducting performance benchmarks is always challenging. First, you have to decide on a model workload. Next, you have to accurately simulate that workload. The system under test should be as close as possible to your production environment. The list gets long. In the end, no benchmark is going to be perfect. Best you can hope for is reasonable.

So I looked at some common themes in the workloads from some of our customers, and decided that I would simulate a basic use case of keeping metadata about a collection of 1 Billion files. Whether you’re a cloud based file storage provider or a photo sharing site, the use case is familiar. The test would use the appropriate access patterns for the database. Since MongoDB does not support joins, I’m not going to put it at a disadvantage by moving join logic into the application. Instead, I’m going to make full use of the native document centric interface.

Test 1: Loading the Initial Data Set

The test harness itself is a multi-host, multi-process, and multi-threaded python application. Because of the GILin python, I ended up designing the test harness so that it forks off multiple processes, with each process having some number of threads. It also turns out that I needed more than 10 client machines to saturate the cluster with reads using python, so I rewrote the read tests suing C++ for MongoDB and C for Clustrix.

While populating the dataset into MongoDB, I kept on running into a huge drop off in performance at around 55-60M rows. A 10 node cluster has an aggregate of 320GB or ram and 80 160GB SSD drives. That’s more than enough iron to handle that much data. As I started to dig in more, I saw that 85% of the data was distributed to a single node. MongoDB had split the data into multiple chunks, but its balancer could not (would not?) move the data to other nodes. Once the database size exceeded that node’s available memory, everything went to shit. The box started thrashing pretty badly. It seems that under a constant high write load, MongoDB is unable to automatically redistribute data within the cluster.

To get the test going, I split the files collection into an even distribution. Without any load on the cluster, I watched MongoDB move the chunks onto the 10 replica sets for an even layout. Now I was finally getting somewhere.

Immediately, I noticed that MongoDB had a highly variable write throughput. I was also surprised at how low the numbers were. Which led me to discover Mongo’s concurrency control: a single mutex over the database instance. Furthermore, in tracking the insert performance along with memory utilization, I could see that getting to more than 300 million records would spill some of the data set to disk. While that’s a reasonable benchmark for a database, I decided that I would keep the data set memory resident for Mongo’s sake.

The drop-off on the Clustrix happened because not all of the load scripts finished at the same time. A couple of the client nodes were slower, so they took a bit longer to finish up their portion of the load.

For a system which eschews consistency and durability, the write performance on MongoDB looks atrocious. Initially, I thought that Mongo completely trashing Clustrix on the write performance. The result was a complete surprise. Here’s why I think Clustrix did so much better:

And that’s with fully durable and fully consistent writes on the Clustrix side.

Test 2: Read Only

The read test consists of the following basic workloads:

get the 10 latest updated files for a specific user

count the number of deleted files on a given server id

I chose these queries because they are representative of the types of queries our example application would generate, and they are not simple point selects. Getting a distributed hash table working is easy. But DHTs tend to fall apart fairly quickly when queries start introducing ordering, examining multiple rows, or other non key-value lookups. In other words, real-world use.

So on a read-only test, MongoDB and Clustrix are within 1% of each other for test1. Clustrix is faster on test 1 and MongoDB is 7% faster on test2. I captured a profile of Clustrix during a test1 run, and saw that the the execution engine dominates CPU time (as opposed to say SQL parsing or query planning). In looking at the profiles during test2 runs on Clustrix, I saw that we had a bunch of idle time in the system, so there’s room for optimization.

But real-world loads tend to be read/write, so let’s see how Mongo does when we add writes to the equation.

Test 3: Read/Write

My initial plan called for a combination of read-centric and write-centric loads. It seems that most web infrastructures are heavier on the reads than writes, but there are many exceptions. In Clustrix, we use Multi-Version Concurrency Control, which means that readers are never blocked by writers. We handle both read heavy and write heavy workloads equally well. Since MongoDB seems to do much better with reads than writes, I decided to stick to a read-centric workload.

The Clustrix test shows show very little drop off in performance for reads. On the Mongo side, I expected to see a drop off in performance directly proportional to the amount of write load.

However, what I saw mind blowing: Mongo completely starved the readers! The following graph shows the query load on one of the 10 shards during the write portion of the test. I simply started up a single write thread while letting the read test run. The write was active for all of 60 seconds, and it took Mongo an additional 15 seconds to recover after the writer stopped.

MongoDB

Clustrix

The test2 aggregate query is much more computationally expensive compared to an insert. So the read/write ratio for test2 became very skewed. Note that Clustrix did not drop in read throughput at all.

Overall, you can see why every modern DBMS choose to go with the MVCC model for concurrency control.

Conclusion

The SQL relational model can clearly scale. The only place where MongoDB could compete with Clustrix was on pure read-only workloads. But that’s just not representative of real world application loads.

Building a scalable distributed system is more about good architecture and solid engineering. Now that we have scale and performance out of the way, I’m going to review the other important aspects of a DBMS in my upcoming posts.