Thursday, October 19, 2017

A crashing bug in MySQL: the CREATE TABLE of death (more fun with InnoDB Persistent Statistics)

I ended one of my last posts - Fun with InnoDB Persistent Statistics - with a cryptic sentence: there is more to say about this but I will stop here for now. What I did not share at the time is the existence of a crashing bug somehow related to what I found. But let's start with some context.
In Bug#86926, I found a way to put more than 64 characters in the field table_name of the mysql.innodb_table_stats table (which is a varchar(64)). This field uses the utf8 (utf8mb3) character set, so the internal representation of the data is using a length field of a single byte (64*3=192 which is smaller than 256). I wondered if I could have MySQL put a string longer than 255 bytes in that field, so I crafted a CREATE TABLE to try it, and that CREATE TABLE crashed all of below versions (it probably crashes many other versions, I only checked those):

MySQL 5.6.35, 5.7.17, 5.7.18 and 8.0.1,

MariaDB 10.0.29, 10.1.23 and 10.2.6,

Percona Server 5.6.36 and 5.7.18.

The scary thing is that one only needs CREATE TABLE privileges for crashing the database !

Note: this crash is not related to InnoDB Persistent Statistics (I will probably share the details in a later post). However, as this bug was found when I was stressing the implementation of InnoDB Persistent Statistics, the title more fun with InnoDB Persistent Statistics is still relevant.

I did not immediately talk about this because this kind of information is dangerous in the wrong hands, because I wanted to give the vendors a chance to fix the issue, and because I also wanted to give users a chance to upgrade. I (hopefully) responsibly reported this as a security bug to Oracle on July 4th (Bug#86934 which is private) and also sent the information to the MariaDB Corporation, to the MariaDB Foundation, and to Percona. In those reports, I wrote that I would share this information in three months or one month after a fix has been published. Both delays are expired.

Here is the progress on fixing this bug (I might update below after the post has been published):

I reported the bug on July 4th,

MariaDB 10.2.7 was released with a fix on July 12th,

MariaDB 5.5.57 was released with a fix on July 19th,

MariaDB 10.0.32 was released with a fix on August 7th,

MariaDB 10.1.26 was released with a fix on August 10th,

MySQL 5.5.58, 5.6.38 and 5.7.20 were released with a fix on October 16th (Bug #26390632 in the release notes),

...

I am not yet publishing the CREATE TABLE of death as I want to let people upgrade. I might do it in the future.

And last, so people searching Google can get to this post, below is the crash report in the MySQL 5.7.17 error log when running a CREATE TABLE of death:

InnoDB: Failing assertion: norm_len < FN_REFLEN - 1
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
14:45:53 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
key_buffer_size=805306368
read_buffer_size=131072
max_used_connections=5
max_threads=3000
thread_count=4
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4282463 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f9dc42ddbd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9e7507be70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xef584b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7af3e1]
/lib64/libpthread.so.0(+0xf370)[0x7fae5a21b370]
/lib64/libc.so.6(gsignal+0x37)[0x7fae58c0e1d7]
/lib64/libc.so.6(abort+0x148)[0x7fae58c0f8c8]
/usr/sbin/mysqld[0x77fe7a]
/usr/sbin/mysqld[0xf292fe]
/usr/sbin/mysqld(_ZN19create_table_info_t20prepare_create_tableEPKc+0x22)[0xf2bc02]
/usr/sbin/mysqld(_ZN11ha_innopart6createEPKcP5TABLEP24st_ha_create_information+0x3d3)[0xf4dcf3]
/usr/sbin/mysqld(_Z15ha_create_tableP3THDPKcS2_S2_P24st_ha_create_informationbb+0x2ab)[0x80059b]
/usr/sbin/mysqld(_Z16rea_create_tableP3THDPKcS2_S2_P24st_ha_create_informationR4ListI12Create_fieldEjP6st_keyP7handlerb+0x14c)[0xd788bc]
/usr/sbin/mysqld[0xd2c446]
/usr/sbin/mysqld(_Z26mysql_create_table_no_lockP3THDPKcS2_P24st_ha_create_informationP10Alter_infojPb+0x114)[0xd2cb54]
/usr/sbin/mysqld(_Z18mysql_create_tableP3THDP10TABLE_LISTP24st_ha_create_informationP10Alter_info+0x8f)[0xd2cc6f]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x5313)[0xcc72f3]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3b5)[0xcc8975]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa7a)[0xcc946a]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xccadff]
/usr/sbin/mysqld(handle_connection+0x288)[0xd8a228]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x1269f04]
/lib64/libpthread.so.0(+0x7dc5)[0x7fae5a213dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fae58cd076d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f9dc430c4a0): CREATE TABLE test_jfg.[...]
Connection ID (thread ID): 5958
Status: NOT_KILLED

5 comments:

In MySQL and MariaDB, FN_REFLEN is 512 bytes. There is also a FN_REF_LEN_SE, defined as 4000 bytes.

The 512 bytes is definitely not enough to store a maximum-length "filename-safe" schema and table name. In the filename-safe encoding that was introduced in MySQL 5.1, a character may expand to up to 5 ASCII characters. Because the maximum length of schema and table name are 64 characters each, that alone would yield the maximum length of 2*5*64 bytes (ASCII characters). For example, a $ character would expand to the 5 bytes (ASCII characters) @0024.

If you add a separator character '/' between the encoded schema and table name, then we already are at 641 bytes. It is not yet the maximum limit, because you can have partition and subpartition names too.

The stack trace that you posted seems to involve partitioning. I guess the worst-case length should be somewhere around 4*5*64 bytes, if the partition and subpartition names can be up to 64 characters each. Add sizeof("/" "#P#" "#SP#" ".MYD") for the separators, a file name suffix, and the terminating NUL character.

So, I guess that it could make sense to extend FN_REFLEN from 512 to 1293 bytes to allow the creation of maximum-length partition file names. However, file systems would likely remain a bottleneck: Linux extfs only allows path name components to be up to 255 bytes. With table#P#partition#SP#subpartition.MYD the maximum possible length achievable from SQL ought to be 3*5*64+3+4+4 = 971 bytes. Perhaps the file name pattern should rather be table/partition/subpartition.MYD? But even then, the maximum path component length would be 5*64+4 = 324 bytes which is more than extfs can handle.

In the mysql.innodb_index_stats or mysql.innodb_table_stats tables, the table name is encoded in UTF-8 instead of the filename-safe encoding. For partitioned tables, the InnoDB "table name" would not only include the table name, but also the partition and subpartition names and some separators. So, obviously the maximum length will be insufficient.

However, it looks like the assertion failure is unrelated to InnoDB persistent statistics. The assertion was introduced into InnoDB in MySQL 5.6.6 by https://github.com/mysql/mysql-server/commit/e9255a22ef16d612a8076bc0b34002bc5a784627

I remember filing an Oracle-internal bug that involved creating a table with a schema and table name consisting of almost 64 $ characters (fewer so that it would not exceed the 255-byte limit of extfs). It broke something outside InnoDB. I guess I should have tried to find a file system that allows longer path component names, so that the FN_REFLEN limit could have been reached. Or simply use innodb_file_per_table=0 to bypass this file system limitation?

I always found the design of persistent statistics in InnoDB 5.6 unsatisfactory. In MariaDB, at some point in the future, I would like to make InnoDB data files self-contained and remove the InnoDB data dictionary. Basically, have crash-safe, transactional .frm files (or equivalent) for storing everything that was specified by the user at CREATE TABLE, and store everything else (including the locations of secondary indexes and the persistent statistics) inside the .ibd file. This would of course involve adding some user interface for reading and updating the persistent statistics, if the user wants to tune query plans by modifying the statistics.

Note: Marko's comment was only published it on November 28 as I believe it was giving too much information on the CREATE TABLE of death (it is the comment I was referring to in my comment from October 20).

Thank you for not publishing the syntax/how to. It is an interesting point,many do not police or limit table creations. The micro services generation are enhanced by this ability to create on demand. This could take many production systems down.

You are right that restricting the right for table creation is a way to mitigate this problem.

However, to be perfectly clear, I intend to eventually (in 4 to 12 weeks) publish a full explanation of the CREATE TABLE of death, including how to crash a vulnerable MySQL/MariaDB. I believe that this type of information should not be kept private for too long, especially that some people already know the CREATE TABLE of death and many people are able to figure-out enough about is with the information that is already out there.