CockroachDB is 10x more scalable than Amazon Aurora for OLTP workloads

The three design principles of CockroachDB are correctness, stability, and performance. Having achieved our correctness and stability goals with CockroachDB 1.0 and 1.1, we focused heavily on performance with CockroachDB 2.0. For more information on which benchmarks matter, or to see a comparison between CockroachDB 1.1 and 2.0, you can read CockroachDB 2.0 Makes Significant Strides.

Today we are releasing a comprehensive whitepaper that demonstrates how CockroachDB achieves high OLTP performance of over 128,000 tpmC on a TPC-C dataset over 2 terabytes in size. This OLTP performance is over 10x more TPC-C throughput than Amazon Aurora, in a 3x replicated deployment with single-digit seconds recovery time and zero-downtime migrations and upgrades. This far surpasses a typical active-passive database deployment with manual failure recovery. CockroachDB achieves this in serializable isolation, unlike competing databases that sacrifice isolation for performance.

We’re very excited to talk about performance, and have taken care to make this more than just an announcement with vague numbers. In keeping with our open source philosophy, our whitepaper contains a step-by-step reproduction of instructions to verify all our performance claims, as well as context on our benchmarking philosophy and practices. We don’t just want you to take our word for our performance numbers, we’d like to arm you with all the tools you need to check it out for yourself! In this post, we’d like to cover some brief highlights, but do check out the whitepaper for more details and a fully reproducible test script.

Benchmarking Scenario

We’ve decided to start with an unofficial TPC-C benchmark as our first benchmark. The TPC-C is a specification for a set of load that simulates the backend data processing requirements of a large retailer. Unlike NoSQL benchmarks such as YCSB, which involve a single logical table of key-values, no transactions, and only two queries, TPC-C exercises a richer set of SQL semantics with foreign keys, multiple indexes, transaction rollbacks, and joins. It also stresses the underlying storage capabilities of the database.

DISCLAIMER: this benchmark script was not validated and certified by the Transaction Processing Council. The results obtained are not official TPC-C results, and the results are not comparable with any official TPC-C results. Instead, we only compare our results to other unofficial TPC-C results by competing vendors such as Amazon Aurora. We have also made one modification to the TPC-C specification: we run every transaction in serializable mode as opposed to degrading the isolation guarantees selectively. As we show in this benchmark, one can achieve high performance without compromising safety, maintainability, and correctness.

Benchmarking Setup

We benchmarked two scenarios: a 3 node cluster on the TPC-C 80GB (1,000 warehouse) dataset, and a 30 node cluster on the TPC-C 800GB (10,000 warehouse) dataset.

Small clusters

We first set up a 3 node CockroachDB cluster on Google Compute Engine. Three nodes is the minimum for proper fault tolerance in CockroachDB, and a good initial setup. We use 3 n1-highcpu-16 VMs and loaded it with 1,000 warehouses of data. This translates to 80GB of data replicated 3 ways, which results in 200GB of data stored across the cluster after compression.

Large Clusters

We then set up a 30 node CockroachDB cluster on Google Compute Engine, using the same n1-highcpu-16 VMs, loaded 10,000 warehouses of data (a 2TB replicated dataset).

Database

CRDB 2.0

Amazon Aurora RDS MySQL

Throughput (1,000 warehouses)

12,819 tpmC

12,582 tpm

Median latency (1,000)

88.1ms

Not reported

95th percentile latency (1,000)

151.0ms

Not reported

Hardware (1,000)

3x Google Compute Engine n1-highcpu-16 with attached Local SSDs

2x r3.8XL (One read replica with automatic failover)

Database

CRDB 2.0

Amazon Aurora RDS MySQL

Throughput (10,000 warehouses)

128,587 tpmC

9406 tpmC

Median latency (10,000)

88ms

Not reported

95th percentile latency (10,000)

176ms

Not reported

Hardware (10,000)

30x Google Compute Engine n1-highcpu-16 with attached Local SSDs

r3.8XL (MySQL RDS)

Database

CRDB 2.0

Amazon Aurora RDS MySQL

Availability

Can tolerate failure of any node.

Can tolerate failure of a master and automatically promote read replicas to master.

RPO

Zero (data is replicated 3x across the cluster)

Zero (underlying elastic block storage is 6x replicated)

Default Transaction Isolation Level

Serializability

Repeatable read

Takeaways

10x more throughput than Amazon Aurora: Our highest reported tpmC of 128,587 is 10x that of Aurora’s 12,582 tpmC.

Performance under strong isolation
We achieve this performance in serializable mode, the strongest isolation mode in the SQL standard. Unlike Aurora and other databases that selectively degrade isolation for performance, we show that this is not a choice you have to make. CockroachDB gives you both correctness and performance, without sacrificing one or the other.

Don’t just take our word for it, reproduce these numbers yourself!
We aren’t just publishing these numbers today. Complete step-by-step reproduction instructions are in our whitepaper. Our database and all our tooling is open-source, so you can run these benchmarks yourself, and then scrutinize the code to ensure we haven’t missed anything. And if we have, we’d greatly appreciate your bug report or your pull request!