Make MySQL 8.0 Better Through Better Benchmarking

Like many in MySQL community, I’m very excited about what MySQL 8.0 offers. There are a lot of great features and architecture improvements. Also like many in the MySQL community, I would like to see MySQL 8.0 perform better. Better performance is what we always want (and expect) from new database software releases.

Rarely do performance improvements happen by accident – they require running benchmarks, finding bottlenecks and eliminating them. This is the area where I think things could use improvement.

If you come to the MySQL Keynote at Oracle OpenWorld, or if you go to MySQL Benchmarks Page, you find a very limited set of benchmarks. They mostly focus around sysbench performance, with large numbers of connections and large numbers of cores. I’m not convinced this effort is the best use of our time these days.

Don’t get me wrong: as one of the original designers of sysbench, it is a great and simple tool that helps spot many bottlenecks. I still use it to find performance issues. But it is only one tool, which is by no means provides full coverage of real-world MySQL workloads.

I agree with Mark Callaghan (see discussion here): we need to run more benchmarks using a wider set of circumstances, to ensure there are no regressions in new releases. This will help move MySQL performance forward for real users.

Here are some specific ideas on how I think we could benchmark MySQL 8.0 better:

Focus on production-recommended settings. Way too often we see benchmarks run with the doublewrite buffer disabled, InnoDB checksum disabled and no binary log (like in this benchmark). While they produce some excitingly high numbers, they have little practical value for real workloads. At very least I would very much like to see separate numbers for the “street legal car,” versus one designed to set a record on the salt flats.

Go beyond sysbench. Sysbench focuses on PK-only based access for very simple tables, and does not even do JOINs as part of its workload. I would like to see more benchmarks that have tables with many indexes, using secondary key lookups and joins, involving rows with many fields, and medium and large size blobs that are common bottlenecks. We also need more database features covered. Are foreign keys or triggers getting faster or slower? What about stored procedure execution performance? I would love to see these get covered. Mark Callaghan suggests LinkBench, which I think is a fine benchmark to add, but it shouldn’t be the only one.

Do more with sysbench. Sysbench could get more done and cover more important workloads. Workloads with data fitting in memory and not fitting in memory should be shown separately. Testing performance with large numbers of tables is also very important – many MySQL installations for SaaS applications run tens of thousands of tables (sometimes going into millions). I would also suggest running more injection benchmarks with sysbench, as they are more representative of the real world.

Look at latency, not just throughput. The benchmarks results we commonly see are focused on the throughput over a long period of time, without looking at the latency and how performance changes over time. Stalls and performance dips are well known in the MySQL space – especially the famous InnoDB checkpointing woes (though this issue has gotten a lot better) There are other cases and circumstances where stalls and slowdowns can happen.

Measure resource consumption. I very much like how Mark Callaghan shows the CPU usage and IO usage per transaction/operation, so we can get a better idea of efficiency.

Concurrency. Recently, the focus has been on very high concurrency in terms of connections and active connections, typically on very big iron (using as many as 72 cores). And as much as this is important to “future-proofing” MySQL as we get more and more cores per socket every year, it should not be the only focus. In fact, it is extremely rare for me to see sustained loads of more than 20-40 “threads running” for well-configured systems. With modern solutions like ProxySQL, you can restrict concurrency to the most optimal levels for your server through multiplexing. Not to mention the thread pool, which is available in MySQL Enterprise, Percona Server and MariaDB. I would like to see a much more focused benchmark at medium-to-low concurrency. The fact that single thread performance has gotten slower in every Major MySQL version is not a good thing. As MySQL currently runs a single query in a single thread, it impacts query latencies in many real-world situations.

Virtualization. We need more benchmarks in virtualized environments, as virtualization and the cloud are where most workloads are these days (by number). Yes, big iron and bare metal are where you get the best performance, but it’s not where most users are running MySQL. Whenever you are looking at full blown virtualization or containers, the performance profile can be substantially different from bare metal. Virtualized instances often have smaller CPU cores – getting the best performance with 8-16 virtual cores might be a more relevant data set for many than the performance with 100+ cores.

SSL and encryption. MySQL 5.7 was all about security. We’re supposed to be able to enable SSL easily, but was any work done on making it cheap? The benchmark Ernie Souhrada did a few years back showed a pretty high overhead (in MySQL 5.6). We need more focus on SSL performance, and getting it would allow more people to run MySQL with SSL. I would also love to see more benchmarks with encryption enabled, to understand better how much it costs to have your data encrypted “at rest,” and in what cases.

Protocol X and MySQL Doc Store. These were added after MySQL 5.7 GA, so it would be unfair to complain about the lack of benchmarks comparing the performance of those versus previous versions. But if Protocol X is the future, some benchmarks are in order. It would be great to have official numbers on the amount of overhead using MySQL Doc Store has compared to SQL (especially since we know that queries are converted to SQL for execution).

Replication benchmarks. There are a lot of great replication features in newer MySQL versions: statement/row/mixed, GTID or no GTID, chose multiple formats for row events, enable various forms of semi-sync replication, two ways of parallel replication and multi-source replication. Additionally, MySQL group replication is on the way. There seems to be very little comprehensive benchmarks for these features, however. We really need to understand how they scale and perform under various workloads.

Mixed workloads. Perhaps one of the biggest differences between benchmarks and real production environments is that in benchmarks the same workload often is used over and over, while in the real world there is a mix of “application workloads.” The real world also has additional tasks such as backups, reporting or running “online” ALTER TABLE operations. Practical performance is performance you can count on while also serving these types of background activities. Sometimes you can get a big surprise from the severity of impact from such background activities.

Compression benchmarks. There have been some InnoDB compression benchmarks (both for new and old methods), but they are a completely separate set of benchmarks that are hard to put in context with everything else. For example, do they scale well with high numbers of connections and large numbers of cores?

Long-running benchmarks. A lot of the benchmarks run are rather short. Many of the things that affect performance take time to accumulate: memory fragmentation on the process (or OS Kernel) side, disk fragmentation and database fragmentation. For a database that is expected to run many months without restarting, it would be great to see some benchmark runs that last several days/weeks to check long term stability, or if there is a regression or resource leak.

Complex queries. While MySQL is not an analytical database, it would still be possible to run complex queries with JOINs while the MySQL optimizer team provides constant improvements to the optimizer. It would be quite valuable to see how optimizer improvements affect query execution. We want to see how these improvements affect scalability with hardware and concurrency as well..

These are just some of ideas on what could be done. Of course, there are only so many things the performance engineering team can focus at the time: one can’t boil the ocean! My main suggestion is this: we have done enough deep optimizing of primary key lookups with sysbench on high concurrency and monster hardware, and it’s time to go wider. This ensures that MySQL doesn’t falter with poor performance on commonly run workloads. Benchmarks like these have much more practical value than beating one million primary key selects a second on a single server.

Related

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master’s Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.