Benchmarking Amazon Aurora

Amazon Web Services, Microsoft Azure and Google Cloud Platform are the clear leaders in the Cloud Service Provider (CSP) space. The competition between these players drives innovation and results in customers receiving more benefits and better business outcomes over older technologies. The database as a service space is a massive opportunity for customers to gain more in terms of performance, scalability, and high-availability without the overhead or long-term contracts from traditional solutions.

At 2nd Watch we see a lot of customers who want to utilize database as a service to get better performance without the overhead or having all the hassles of traditional database products. In that, we watch all three Cloud Service Providers very closely, their services and run benchmark comparisons on their products. In our opinion, Amazon Aurora is probably the best-performing database as a service platform that we’ve seen to date, while still being very cost-effective for the performance benefits.

Amazon Aurora is AWS’s fully-managed MySQL-compatible relational database service. Aurora was built to provide commercial-grade performance and availability, while being as easy to use and as cost-effective as an open source engine. Naturally, a lot of people are interested in Aurora’s performance, and a number of people have tried to benchmark it. Benchmarking accurately is not always easy to do, and it’s why you see different results from various benchmarks. We’ve helped many of our customers benchmark Aurora to understand how it would perform to meet their needs, and in many cases we have migrated, or are actively working on migrating, customers to Aurora.

Recently, Google benchmarked their own Cloud SQL data against Amazon Aurora. Since a number of our customers use Aurora, we were interested in digging in to take a deeper look at their findings.

In their blog post about this comparison, the blog claims that 1) Cloud SQL performs better at low thread counts and 2) customers do not use a lot of threads, so the results at higher thread counts do not matter. Our experience is that the latter claim is not true, especially as we look at our enterprise customers. So, we questioned our results from previous benchmarking. Therefore, we decided to benchmark things again to see if our results would fall in line with Google’s results. Below you can see an overlay of our data on top of their original graph. Since we do not have the original results from Google’s , an overlay is the best we can do for comparison.

To perform our own benchmarks, we used Terraform to stand up a VPC, our instances of Aurora, and the hosts from which we’d run the sysbench s, all within the same Availability Zone. We used the same settings and commands as outlined in the Google blog post to against our results. We shared our results with the Aurora team at AWS, and they confirmed that they saw very similar results when they ed Aurora with sysbench in a configuration similar to ours.

Full data from our s, R scripts, and automation to stand up the infrastructure can be found on github.

Here is a summary of our observations on this study:

1.The data for Aurora’s performance is incorrect

This data does not match the Aurora performance we see when we run the ourselves. Without any special tuning, we saw higher performance results for Aurora. Without the original data from the s Google conducted it is difficult to say why their results differ. All we can say is that, from our ing, Aurora does appear to be more performant and is clearly the database of choice, especially when dealing with higher thread counts.

2nd Watch overlay in purple:

Results from our s on Aurora:

Both studies agree that Aurora has a significant performance advantage above a certain thread count. Google’s shows the previous release of Aurora having an advantage at 16 threads or more (Cloud SQL starts higher but then drops below Aurora). Our shows the la version of Aurora having an advantage at 8 threads, with results between the two databases being fairly comparable at 4 or fewer threads.

2.The study makes a claim that customers don’t need many threads

Google is suggesting that you should only focus on the results with lower thread counts. Their blog post says that Cloud SQL “performed better than Aurora when active thread count is low, as is typical for many web applications.” The claim that low thread counts are typical for web applications (and thus what people should focus on) is inaccurate. There are a large number of AWS customers who run applications using Aurora, and it is very common for them to run with hundreds of threads (and many customers choose to run with thousands). As we mentioned previously, both our study and Google’s study show that Aurora has an advantage at higher thread counts.

There are a number of areas where Amazon improved on the performance of MySQL when developing Aurora. One of them was taking advantage of high thread counts to deliver better performance. With Aurora, customers with higher thread counts will typically see a large performance advantage over MySQL and Cloud SQL.

3.Aurora’s largest instance outperforms Cloud SQL’s largest instance by 3X

Google’s benchmark was done on an r3.4XL, which is only half the size of Aurora’s largest instance (the r3.8XL). We understand that this study used Cloud SQL’s largest machine and that they were trying to compare to a comparable Aurora instance size. But a customer is more likely to run a workload like the one in Google’s benchmark on Aurora’s largest machine, the r3.8XL, because the 8XL’s larger cache would improve performance. We ran the with Aurora’s r3.8XL instance, and the results were more than 3X higher than Cloud SQL’s performance. It was a struggle to fit Aurora’s r3.8XL performance on the same scale.

Aurora on r3.8xlarge (scale adjusted):

A few more things to consider

We’ve already pointed out Aurora’s performance advantage, which is evident in both studies. Aurora has a number of other advantages over Google Cloud SQL. One example is scalability. We showed earlier that Aurora’s largest instance exceeds the performance of Cloud SQL’s largest instance by 3X in this benchmark. Aurora can scale up to handle more storage (64 TB for Aurora vs 10 TB for Cloud SQL). Aurora supports up to 15 low-latency read replicas that have very low replica lag. Aurora replica lag is typically only a few milliseconds, whereas traditional binlog-based replication, which is used by MySQL and Cloud SQL, can result in replica lag on the order of seconds or even minutes. Aurora allows up to 15 replicas that can act as failover targets without impacting the performance of the primary instance, whereas Cloud SQL allows only one failover target. Another advantage is that Aurora has extremely fast crash recovery, as its log-structured storage system does not require the replay of redo logs after a database crash.

If you’re thinking about benchmarking Aurora yourself, there are two common errors to watch out for. First, make sure you put the database and the client in the same Availability Zone or you will pay a latency and throughput penalty. This is what customers actually do when they use Aurora in production. Second, make sure that the client instance driving traffic to Aurora has enhanced networking enabled. The benchmarking guide mentioned above has instructions for how to do that.

Making good use of benchmarking data is tricky. You’re trying to build or migrate your application and you want to understand how your new database is likely to behave, now and in the future. There’s no magic benchmark that will exactly match your production workload, but you need to make a decision anyway. What should you do?

Here’s some general advice we try to follow for our own projects as well.

Plan for success by ing at scale: The last thing you want to do is make design choices that hurt you when your application starts to really take off. That’s when you want to celebrate with your team and build the next great feature, not attempt an emergency database migration. Think about how your workload might change as your application grows (higher thread counts, more data, higher TPS, more tables…) and build that into your s.

Choose benchmarking s that align to your application: Do you expect your data set to fit in memory or will your database be disk-bound? Will your traffic pattern be spiky? Write-heavy? You can find a large number of benchmarks for any major technology. Try to identify a set of s that are as close as possible to your real-world application. Don’t just accept the first benchmarking study you find. If you can re-run your actual production workload as a , even better.

Know that your benchmarks are not the real world and plan for that: The best benchmarking study you run will be, at best, slightly wrong (not the same as the real world). Make a note of the assumptions you made in your and keep an eye on your database in production. Watch for signs that your production workload is moving into uned areas and adjust your s accordingly.

Get help: The best way to get expert advice is to see an expert. Cloud technology is complex and not something you should seek guidance on from partners who are “generalists.” To Aurora at your company, visit us at www.2ndwatch.com.