HammerDB has increased dramatically in popularity and use and has been identified as the industry default for database benchmarking illustrating both the popularity of open source and TPC based benchmarks. Consequently with the wish to engage with the open source database benchmarking community the TPC approached HammerDB to collaborate on development and bring together both the communities around the TPC benchmarks and HammerDB. The first step towards this goal is the recently announced move of the HammerDB source code to the TPC-Council GitHub repository. The previous developer owned GitHub repository sm-shaw github repository has been deleted and the TPC-Council repository will be the source code repository for ongoing and future HammerDB development. Anyone wishing to contribute to the development of HammerDB should do so from the HammerDB TPC-Council GitHub repository. From the first release up to version 3.1 HammerDB has been released on the HammerDB sourceforge site and currently hosts the runnable binary downloads and support site. Over time downloads and support will also transition to the TPC-Council GitHub site. The HammerDB website hosts the latest documentation in docbook format and links to published benchmarks and will be maintained as the primary website. Downloads from the download page on this site will show the current release. To understand the difference between the downloadable binaries and the source code view the post on HammerDB Concepts and Architecture.

The license of HammerDB remains as GPLv3 and copyright to Steve Shaw the developer of HammerDB. For additional clarity Steve Shaw is an employee of Intel however HammerDB is not Intel software and was developed as an approved personal open source project where a stipulation of this approval was that there would be no association of the software with Intel or the developers status as an employee of the company.

In the recent MySQL 8.0.16 release there is a new variable for the InnoDB storage engine called innodb_spin_wait_pause_multiplier described as providing “greater control over the duration of spin-lock polling delays that occur when a thread waits to acquire a mutex or rw-lock” and “delays can be tuned more finely to account for differences in PAUSE instruction duration on different processor architectures”

This post aims to address what this about and whether you really need to be concerned about the difference in instructions on the different processor architectures for MySQL by testing the performance with HammerDB. Note that the main developer of HammerDB is Intel employee (#IAMINTEL) however HammerDB is a personal open source project and HammerDB has no optimization whatsoever for a database running on any particular architecture. Fortunately the HammerDB TPC-C/OLTP workload intentionally has a great deal of contention between threads and is therefore ideal for testing spin-locks.

So the PAUSE instruction is an optimization over NOP when a thread is waiting to acquire a lock and is particularly important in spin-locks on x86 CPUs for power and performance. However in the Skylake microarchitecture (you can see a list of CPUs here) the PAUSE instruction changed and in the documentation it says “the latency of the PAUSE instruction in prior generation microarchitectures is about 10 cycles, whereas in Skylake microarchitecture it has been extended to as many as 140 cycles.” and “as the PAUSE latency has been increased significantly, workloads that are sensitive to PAUSE latency will suffer some performance loss.” The impact on a database workload is not necessarily straightforward however as it depends on how much time that workload actually spends in spin-locks, fortunately as noted a HammerDB deliberately induces a great deal of contention so is at the top end of workloads that do.

Also it is interesting to note that the impact of this change was not observed in other databases or other MySQL storage engines such as MyRocks the only noticeable impact with HammerDB workloads occurred in MySQL with InnoDB where in the source code ut0ut.cc in the directory storage/innobase/ut the following section has UT_RELAX_CPU defined to call the PAUSE instruction on x86.

for (i = 0; i <delay * 50; i++) {
j += i;
UT_RELAX_CPU();
}

Note that this fixed value of 50 is multiplied by the parameter innodb_spin_wait_delay which has a default of 6 (but selects a value at random up to this value) so could be calling PAUSE up to 300 times. So innodb_spin_wait_delay has always been configurable in recent versions but now from MySQL innodb_spin_wait_pause_multiplier is configurable also rather than requiring modification of the source code to do so. However as noted many factors affect spin-locks including how much time you actually spend in locks in the first place and therefore the best way to really determine how much time you have to spend in fine-tuning spin-locks in the real world can be done through testing.

So to test I took a system with Skylake CPUs and all storage on a P4800X SSD.

and then ran the test leaving it unattended until it reported the message “TEST SEQUENCE COMPLETE”. In particular note that the parameter innodb_spin_wait_pause_multiplier remained at the default throughout.

what I found was when reducing innodb_spin_wait_pause_multiplier signficantly to a value such as 5 I could achieve marginally improved performance up to 40 virtual users 643423 NOPM at a value of 5 compared to 626630 NOPM at 50 however beyond this point performance was signficantly lower and at 64 virtual users was only 278909 NOPM. The optimal performance remained at the default.

Further testing involved doubling the number of warehouses with the same results, however not partitioning the schema did bring some benefits in reducing innodb_spin_wait_pause_multiplier to a lower value.

The key finding was remarkably consistent with that of spin-lock configuration on other databases in that in the vast majority of use cases you simply do not have to worry about modifying these parameters from the default regardless of the processor architecture. However it is noted that there are many factors that input into home much time is spent in spin-locks in the first place until these parameters come into play, examples such as the speed of the memory and the storage and configuration of the schema with partitioning as well as the operating system. In particular the number of concurrent users is going to be a significant factor.

Nevertheless if you have an application generating millions of transactions per minute and have the opportunity to test and optimize surely the availability of more parameters to fine tune a database to a particular application is welcome because with each release MySQL is continuing to push the boundaries of performance over previous releases.