Saturday, June 10, 2017

Summary of recent performance tests for MySQL 5.6, 5.7 and 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I have been reporting on low-concurrency performance regressions in MySQL for a few years and recently published many reports to compare MySQL 5.6.35, 5.7.17 and 8.0.1 using Intel NUC servers. This is tracked by bug 86215. My summary of the recent tests is:

The problem is not that there is a CPU regression from MySQL 5.6 to 5.7 to 8 as that is expected. The problem is that the regression is too large. In the worst case, I have MySQL 5.6 gets up to 2X more QPS than 5.7 and 8 using sysbench. In a typical case MySQL 5.6 gets 1.2X to 1.3X more QPS than 5.7 and 8.

Most of the slowdown is from MySQL 5.6 to 5.7 and less of the problem is from 5.7 to 8. I think this is good news. This is based on my results from the i5 NUC.

My team is committed to making this better. I hope that upstream is too. One day the big Percona Live presentation from upstream will include benchmark results for MySQL at low concurrency in addition to the results we always get for extremely high concurrency.

If you publish results from tests for N configurations, you will always get a request for testing one more configuration. So keep N small and save energy for the followup. But my hope is that we learn something from the tests that I did, rather than ask for more tests. A more clever person would run tests for N configurations, initially share results for N-2 and then when asked for a few more configs wait a day and share the final two. Alas, I am not clever.

2 comments:

Mark, what kind of (automated) testing and development processes do you think are missing (or not emphasized enough) so that such regressions do not ship to major releases?

Do you think it is due to lack of a reproducible performance testing suite? Is it that full coverage is difficult to get, and the tests don't capture such use cases? Or is a full comprehensive suite really slow to run and hard to produce results for? Or are teams just not paying enough attention to such metrics (maybe there are too many metrics)?

I know that there's a lot of things to measure: Different workloads, different configurations, different hardware setups, different metrics to track, etc.

To turn the question on it's head, what should teams look at to make sure perf does not have regressions.

I know I'm asking for a lot of hypothetical things, but I would love to hear your thoughts, especially given your experience working on database technologies inside teams with more sophisticated and mature dev-ops/etc support.

I wish a lot of technologies that get chosen because of (or market) performance reasons had public-facing dashboards in things like Jira for the various versions. I know a lot of teams have these sort of things internally, and setting the up and maintaining them and making sense of them is a huge pain, but that sort of open-source or in-the-open development style would help.

I don't know the QA process for upstream -- for correctness and performance. The source is open, the development process is not. An in-memory sysbench workload would be sufficient for catching the CPU regressions that I have been reporting. With modern sysbench you can write new tests in Lua and we have been using that to spot regressions that we add by mistake to MyRocks.