Description

We migrated our application from MySQL 5.6.21 to Mariadb 10.1.16 to use Data at Rest Encryption (DARE) and it caused a major issues and application started stalling for no reason.

When we use DARE on INNODB tables, using the out of the box plugin file_key_management using the configuration on this page https://mariadb.com/kb/en/mariadb/data-at-rest-encryption/, we encountered periodic (in regular intervals i.e. every hour) high CPU and it stalled the system for any use. Even when there is no user/application connections to the server, the CPU spikes happened regularly. No indication in any logs or anywhere on what was happening. It was a debugging nightmare.

We were using innodb_encryption_threads = 4 as indicated in the above page.

On extensive analysis, following was discovered

Mariadb starts the background threads as specified in the innodb_encryption_threads to perform 2 things - data scrubbing i.e. to remove deleted data and to re-encrypt data pages when key is changed.

The issue noted here is even when scrubbing for compressed and uncompressed is turned off and also when there is no key changed for re-encrypt, the background threads starts periodically as defined in the innodb-background-scrub-data-check-interval and hogs the CPU as high as 200% on a 2 core system for nearly 20+ minutes (depending on the data volume) doing "NOTHING" or to say "NOTHING TO BE DONE", and this stalls the CPU and the system is unusable.

Suspect following critical issues
1. No checks are done to see if scrubbing is enabled for compressed or uncompressed data to start the threads.

Below is the config out of box for scrubbing

MariaDB [(none)]> show global variables like'%scrub%';

+---------------------------------------------+--------+

| Variable_name | Value |

+---------------------------------------------+--------+

| innodb_background_scrub_data_check_interval | 3600 |

| innodb_background_scrub_data_compressed | OFF |

| innodb_background_scrub_data_interval | 604800 |

| innodb_background_scrub_data_uncompressed | OFF |

| innodb_immediate_scrub_data_uncompressed | OFF |

| innodb_scrub_log | OFF |

| innodb_scrub_log_speed | 256 |

+---------------------------------------------+--------+

2. There is no check to see if the encryption key has changed to start the new threads. Also per the documentation "This plugin does not support key rotation — all keys always have the version 1.", so it gives more reason not to start the encryption threads until a key change is detected.

3. Encryption/Scrubbing Threads are behaving like high priority threads i.e. it hogs the CPU stalling the system i.e. generally any background processes work on low priority threads such that the core DB functionality is not affected.

4. No information noted in any of the system tables or in the processlist, that the Encryption threads are running and status of processing

Link to a similar high CPU issue has been noted in this ticket MDEV-10368

We temporarily solved the problem by setting innodb_encryption_threads = 0

Attachments

Issue Links

causes

MDEV-13639Server crashes in prepare_inplace_alter_table_dict upon altering a table with discarded tablespace