Blog of (former?) MySQL Entomologist

Sunday, May 31, 2020

I often have to deal with customers claiming there is a memory leak in MySQL affecting their production servers. To prove the leak is real and show where it happens usually running with tcmalloc and relying to it's heap profiler, or even more intrusive, running under Valgrind Massif is needed. This is normal when the test case demonstrating the leak is isolated and we prove a leak for some bug report. But hardly anyone in production in a normal situation would agree to do this, as performance impact is very serious.

So, recently I decided to check if dynamic tracing (along the lines of this page by Brendan Gregg, with Flame Graphs to show suspicious code path) is a more reasonable alternative. As I have to work with different Linux versions, often as old as CentOS 6, I can not rely on bcc tools for this. I mentioned more than once that in internal discussion that adding dynamic probe for malloc() (and other memory allocating functions, if you prefer) with perf may already be a good start and give some data to study. So, time to try this approach myself (so that I am ready to suggest it to customers with all the details at hand) and continue my series of posts on dynamic tracing. It can not be all the time about bugs, readers needs solutions as well :)

For this post I decided to work with latest and greatest MySQL 8.0.20 that I have built from source on my good old almost irrelevant Ubuntu 16.04 netbook:

I do not see tcmalloc or jemalloc in the output above, but libc is linked dynamically. So, malloc() implementation would likely come from it (/lib/x86_64-linux-gnu/libc.so.6 binary, to be specific). Let me add the probe (see this older post for more details, some details about probes syntax presented here are also useful):

One of the first things I want to be able to find out is how many bytes are allocated by malloc(). In case of libc malloc() on this Ubuntu I could just use the fist and only argument name (you'd be surprised what it really is, try to find out!) or just $params (see man perf-probe for details). But in a general case (like with jemalloc on the same system) I should better refer to it based on the way it is passed - via %DI (not short form, NOT %EAI) register on x86 systems. This is why you see size=%di:s64 in the probe definition. Note also that to get real value as integer I had to use s64 type (for signed 64-bit integer, even though I;d expect the argument to be unsigned). By default and as unsigned we get hex value, and one day I'd like to be able to count also bytes allocated, not just calls.

I wanted to add return probe for the same function, in a hope to store (one day) the pointer and then maybe check (with other probe) if it is ever explicitly freed. But in my case it is not possible to get the return value:

openxs@ao756:~$ sudo perf probe -x /lib/x86_64-linux-gnu/libc.so.6 malloc_ret='malloc%return $retval'Failed to find "__libc_malloc%return", because __libc_malloc is an inlined function and has no return point.Added new event: probe_libc:malloc_ret (on malloc%return in /lib/x86_64-linux-gnu/libc-2.23.so with $retval)

You can now use it in all perf tools, such as:

perf record -e probe_libc:malloc_ret -aR sleep 1

I am yet to check the reason, but according to the above the function is inlined and thus does not have a return point. Sounds weird. Note that for jemalloc I was able to add similar probe successfully and get the pointer value traced.So, what do I get when I try to trace this probe in perf? Let me show. If I run some stupid test surely allocating memory in one shell:

I tried to trace all calls to malloc() from glibc, for all processes, with stack traces. As a result I've got 179 MB of raw data in perf.data over just 10 seconds, with default frequency of sampling! This is too much for any pracrtical purpose, so make sure to set smaller frequency (like -F99) or performance impact may be too seriois. With newer kernels and eBPF this impact may be notably reduced.

I've got both allocation size reported (and may get a good use for it one day), and stack traces, showing where the allocation is coming from! This is exactly what developers need to check if it is really a possible leak.

Now what I can I do with the above? I can use perf report (sudo perf report > /tmp/report.txt) and get the call graphs in text:

For this test I've included allocations for all the processes and had to use stackcollapse-perf.pl script, as stack traces are coming in perf format.

The result (actually a part of it, just for the mysqld process, copy/pasted as .png static file) may look as follows:

Sample malloc() calls flame graph build from one of tests.

Now you can see where most malloc() calls happen, and this is the first step in checking where leaks may come from if we really suspect them. Quite a lot of fun for a simple probe added with perf, on any Linux system younger than 10 years!

Next time I am going to apply this approach to some real life use case that looks like a leak and see if anything useful, not just fun, can be derived from it. For now I know it's doable and I am sure about the steps (documented here and more in my background notes) to trace malloc() with perf dynamic probe.

Saturday, May 30, 2020

In my previous post in this series I've commented on some interesting MySQL bug reports that were added during the second half of April. Time to move on to bugs reported in May, 2020, as we are quickly approaching MySQL Bug #100000 soon and I want to write a separate post for this great achievement :)

Here is the list:

Bug #99432 - "Improving memory barrier during rseg allocation". Nice contribution by my former colleague in Percona, Krunal Bauskar, who now works on making MySQL better for ARM processors. According to his findings, the use of a relaxed memory model improves performance on ARM by up to 2%. See also yet another bug report with a contribution that matters for ARM, Bug #99556 - "Avoid sequentially consistent atomics for atomic counters" (contributed by Sergey Glushchenko from Percona).

Bug #99444 - "New HASH JOIN order problem". One should not expect and rely on any specific order unless explicit ORDER BY is used, so formally this report by Gabor Aron is "Not a Bug". I put it into this list as several other community members helped him a lot in understanding why results with HASH_JOIN optimization in newer versions are still valid and what are the ways to get the results with the desired ordering efficiently. Guilhem Bichot, for instance, suggested two different ways, using window function and lateral table. Useful reading in any case!

Bug #99458 - "i_s_fts_index_cache_fill_one_index() is not protect by the lock". Looks like even crashes are possible as a result, based on comments. Nice finding by Haiqing Sun.

Bug #99459 - "SQL run with GROUP_MIN_MAX may infinite loop and never return". After some discussion around the validity and severity of bug reports where test case involved adding DEBUG_SYNC() to show the problem in a predictable way, this great bug report by Ze Yang was verified. All MySQL GA versions are affected, including 8.0.20! As a side note, I'd prefer NOT to read such discussions any more. They are wasting time of all parties involved.

Bug #99499 - "Incorrect result when constant equailty expression is used in LEFT JOIN condition". This bug that affects MySQL 5.7.x only (it was fixed in MySQL 8.0.17+ and in 5.6 code was different) was reported by Marcos Albe from Percona.

Bug #99504 - "Generated column incorrect on INSERT when based on column w/ expression DEFAULT". Several problems are highlighted in the complex enough test case submitted by Brad Lanier.

Bug #99582 - "Reduce logging of new doublewrite buffer initialization which is confusing". 180 lines or so are added when --log-error-verbosity is set to 3. As a workaround one can add:

log-error-suppression-list="MY-011950"

to the [mysqld] section of the .cnf file. This problem was reported by Simon Mudd. Make sure to read all comments.

Bug #99591 - "Option --tc-heuristic-recover documentation wrong, missing details". In reality it does not work with more than one XA-capable engine installed. I wish fine manual documents the reality, not the good intentions of the past. This documentation request was added by Sami Ahlroos.

Bug #99593 - "Performance issues in 8.0.20". It seems to be yet another TempTable engine problem that caused regression comparing to MySQL 5.7. At least this:

SET GLOBAL internal_tmp_mem_storage_engine=MEMORY;

is a workaround. The bug (a duplicate of internal Bug #30562964) was reported by billy noah and is fixed in upcoming MySQL 8.0.21.

Bug #99601 - "Broken Performance using EXIST function, increasing execution time each loop". This regression bug (without tag, but who cares...) in MySQL 8.0 was reported by Ronny Görner and minimal test case demonstrating that the problem is actually with function call was contributed by
Shane Bester.

Bug #99643 - "innobase_commit_by_xid/innobase_rollback_by_xid is not thread safe". This bug was reported by Zhai Weixiang, who had also suggested the fix in the code.

Bug #99717 - "Performance regression of parallel count". Great bug report with code analysis and ready to use MTR test case from Ze Yang. Sunny Bains already confirmed that the problematic code is going to be removed.

To summarize:

I am happy to see Oracle engineers explaining community bug reporters the reasons and possible solutions for the problems they hit that are not actually caused by any bug in MySQL. I tried to do this as well, whenever possible, while working on MySQL bugs...

We can still find speculations that if the bug is repeatable only by adding DEBUG_SYNC() or similar debug lines, then it can not be verified or gets lower severity... IMHO this is nonsense, as there are many high severity verified real bug reports where this method is used to demonstrate the problem clearly. Just stop it!

Sunday, May 24, 2020

There are many things to write about MySQL this week. It turned 25 years old, to begin with! Quite successful the first ever Percona Live ONLINE 24 hour conference also happened this week, and I've presented my talk there...

But this is a blog of former MySQL Entomologist, so when I have nothing ready to share about something exciting or immediately useful, I write about MySQL bugs. This is the case today as well. I need more time to think about MySQL history and write down the details to complement my presentation etc. Previous post in this series appeared a month ago, so let me present a yet another list of InnoDB, optimizer, replication and few other bugs reported by MySQL Community users since April 18, 2020 and before May, 2020:

Bug #99295 - "InnoDB in-memory estimate is wrong after an index is created". As Øystein Grøvlen found out, even though the entire newly added covering index is in the buffer pool, the buf_stat_per_index_t::get() function in MySQL 8 estimates just 1 page is in memory, and as a result the index is not used.

Bug #99326 - "undo truncation might still not be crash safe". After nice explanations of what may happen from
Kevin Lewis:

"After and internal discussion with Sunny Bains, I think I understand the concern better. Let's assume that a redo log is so large that it contains redo entries for all 512 Space IDs of an undo tablespace that is being truncated too often. In other words, even though each truncate removes old pages from the buffer pool and flushes newly created pages, it does not actually cause a checkpoint for each truncation like it did in 5.7. So the redo log can possibly contain records for more than 512 space IDs.

There is a worklog tested and pushed to the 8.0.21 release branch that fixes this highly unlikely possibility.

As part of WL#11819, we keep a count of the number of truncations that have happened between checkpoints. So if there is more than (512 / 8) truncations between checkpoints, then no more truncations can happen on that undo space until the next checkpoint happens."

this (regression vs 5.7) bug report by Zanye Zjy was closed as "Not a Bug". This is a totally wrong status and handling for such a case.

Bug #99354 - "Nondeterministic stored function returns incorrect results". This interesting bug was reported by Jacob Chafik. If anyone cares, it is still repeatable on 8.0.20 (and not repeatable on MariaDB 10.4.x):

Wrong results should not be produced for logically equivalent queries, even if plans are different. Unfortunately this regression (since 5.7.5+) bug had not got the "regression" tag.

Bug #99359 - "Order by in group_concat with prepare statement returns unexpected results". Two executions of the same statement with the user variable having the same value assigned should either both work or fail with error message, it's a matter of consistency. This is not the case in a situation described in the bug report from Feng Liyuan.

Bug #99363 - "Innodb_data_pending_fsyncs shows incorrect value". In this bug report SeongUck Lee clearly shown that the problem happens and gave some hints based on source code review what could cause this regression in 8.0 comparing to 5.7. Moreover, after adding debug assertion he managed to show how it is hit and what wrong values are produced, in gdb. Still this report for now ended up as "Not a Bug" (until a test case is presented, reportedly. What a shame!

Bug #99377 - "Assertion `thd->get_transaction()->is_empty(Transaction_ctx::STMT)' failed in ". Surely dropping and re-creating the mysql.general_log table is a corner case, but even in debug builds it should end up with some proper error message, not just assertion failure. This bug was reported by
Roel Van de Paar.

Bug #99398 - "Data in wrong row on left join". This regression bug in MySQL 8.0.20 reported by Soner Sayakci is already fixed in upcoming MySQL 8.0.21. Good news.

Bug #99412 - "Threads_running becomes scalability bottleneck on multi-node NUMA topologies". Nice bug report from Sergey Glushchenko, with a patch contributed. perf annotate was used to demonstrate the problem. Good to know Percona engineers keep working to make MySQL perform better!

Bug #99413 - "Constant propagation get the wrong result when mix with the different collations". Again, this bug report from Wj Huang was closed as 'Not a Bug" based mostly on the results from 8.0.21+ that is not yet released. This is either a "Duplicate" if the exact fix that helped in 8.0.21+ can be isolated, or "Can't repeat", but setting it to "Not a Bug" seems wrong to me.

Bug #99414 - "Where exists performance 8.0.16 onwards". This optimizer regression bug in 8.0.16+ (comparing to 5.7) was reported by Jonathan Taylor. See also a comment by Øystein Grøvlen about the tool available at https://github.com/ogrovlen/opttrace to pre-process and get a condensed trace for the join optimizer, that helps to understand what's going on.

Let me stop for now. Many more interesting bugs were reported in May, so stay tuned!

Happy Birthday, MySQL and Sakila!

To summarize:

Now (regression!) bugs are sometimes closed as "Not a Bug" after explanation what the problem is and statement that there is a fix in a version not yet released. Not as a "Duplicate", but "Not a Bug", after confirming the problem. This is a new level of wrong bugs handling!

Same with a bug where bug reporter is still working on a repeatable test case, after showing the problem happened with some evidence, "Not a Bug".

Oracle does a good job recently in making sure MySQL Server bugs are not hanging around without reaction and are processed fast, but looks like this is partially achieved by lowering standards of bugs processing that were established over last 15+ years. Closing reports by all means as soon as possible as "Not a Bug" is not a way to go. This is sad.

Saturday, April 25, 2020

Time for the next blog post about MySQL bugs! The previous one covered some bugs I considered interesting in March, this one will be about community bug reports that I've picked up since March 26, 2020. I'd better review bug fixed in MySQL 5.7.30 instead of this, but it is still not released, even though we know it must get some important security fixes based on "Oracle Critical Patch Update Advisory - April 2020" recently published.

Bug #99082 - "Problem with replication: XA transaction, temp tables and row based binlog". As noted by Libor Laichmann, creating temporary tables inside XA transaction leads to broken replication in MySQL 5.7.29, with somewhat misleading error message. This is unfortunate. We do not see any evidence, yet, if 5.6 and 8.0 branches are similarly affected.

Bug #99100 - "GROUP BY will throw table is full when temptable memory allocation exceed limit". More fun with TempTable storage engine that I blamed previously. This bug was reported by Dean Zhou, who had performed detailed analysis in gdb and suggested a fix. It took some efforts for the bug reporter to get it "Verified" as a regression bug in MySQL 8.0.19 (without a "regression" tag, surely).

Bug #99174 - "Prepare transaction won't rollback if server crash after online ddl prepare stage". This bug was reported by Zhang Xiaojian. Additional debugging code (one DBUG_EXECUTE_IF statement) was added to the source to demonstrate the problem easily and it caused some questionable arguments of a kind:

"First of all, changing our server code in order to cause a bug, can not be considered a repeatable test case."

But the bug reporter provided more details and clear test case, and had not agreed with the above. The bug was soon "Verified", even though no attempts to check (or explain) if 5.7 may be similarly affected were made in public, so we still have to wonder if this is a regression.

"When aggregate_thread_status is called for other threads, it's possible for that thread to have exited and freed the THD between the time we check that the thread was valid, until the time we call get_thd_status_var."

Ironically, he had to add some conditional my_sleep() call to server code to get a repeatable test case, and this was NOT a problem for a good Oracle engineer to verify the bug immediately. There are still bugs in my beloved Performance Schema. Who could imagine that?

Bug #99200 - "CREATE USER get stuck on SHOW PROCESSLIST and ps.threads on slave". So, slave may disclose some sensitive information. As demonstrated by Marcelo Altmann, if a query had been rewritten by the parser because it contains sensitive information, it won't be cleaned up when slave's SQL thread applies it, making it visible in SHOW PROCESSLIST and performance_schema.threads. Both 5.7.29 and 8.0.19 are affected. Bug reporter had contributed fixes for both versions. See also another bug in Performance Schema that he reported and contributed a fix for, Bug #99204 - "performance_schema threads table PROCESSLIST_INFO incorrect behaviour".

Bug #99205 - "xa prepare write binlog while execute failed". Then XA PREPARE from the binary log is executed on slave and... we are in troubles. The bug was reported by Phoenix Zhang. Unfortunately it is still not clear from this verified bug report if MySQL 5.6 and 5.7 are similarly affected (probably they are).

Bug #99206 - "lock_wait_timeout is taking twice time while adding new partition in table". Nice finding by Lalit Choudhary. MySQL 8.0.19 is not affected by this bug.

Bug #99244 - "bad performance in hash join because join with no join condition between tables". This optimizer bug (wrong join order when new hash join optimization is used) was reported by Chunyang Xu. But note also a simple test case and perf outputs contributed later by Shane Bester.

Bug #99257 - "Inconsistent output and wrong ORDER BY Sorting for query with LIMIT". Yet another 5.7-only optimizer bug found by Lalit Choudhary. My quick test presented in a comment shows that MySQL 5.6.27 produced correct results, so this is a regression bug.

Bug #99273 - "Item_ref in Having Clause points to invalid outer table field". This may lead to wrong results for simple enough queries. This regression bug that affects both MySQL 5.7.29 and 8.0.19 was reported by Shanshan Ying.

Bug #99286 - "Concurrent update cause crash in row_search_mvcc". This great bug report with code analysis, fix suggested and a test case with detailed instructions was created by Zkong Kong. It was promptly verified, but I still miss any documented attempt to check (by running the test case or by code analysis if the bug applies only to 5.7.29 or MySQL 8.0.x is also potentially affected. For now I'll try to remember this assertion line:

Saturday, April 18, 2020

Recently Monty explained that with MariaDB in a general case (backup, proper shutdown, storage engines incompatibilities, Galera, async replication, and maybe few bugs and corner cases aside) it should be possible to easily and directly upgrade from one major version to the other, skipping any number of intermediate major versions in between.

Specifically, the idea was the there is no need to run mysql_upgrade from each intermediate release (10.0, 10.1, 10.2, 10.3 and 10.4) if your goal is to eventually upgrade from MariaDB (or MySQL) 5.5 to MariaDB 10.4, and that mysql_upgrade is designed and coded in a way to take all possible differences for the version it is coming from into account. It supposedly includes and does all intermediate changes needed for all versions since the first release of the tool, and skips errors (that may appear only if some specific change was already done and is not strictly needed). He added a (new) KB article explaining this also.

This was a kind of surprise for me, as until now MySQL manual, for example, clearly says:

"Upgrade that skips versions is not supported. For example,
upgrading directly from MySQL 5.5 to 5.7 is not
supported."

As a support engineer I always highlighted the need to upgrade step by step, one major version at a time, and check how the entire setup and, most importantly, applications work after each step. I prefer to know from the very beginning in what exact version the problem appeared during upgrade. The problem may be of any kind, and of them corrupted data and different execution plan for some important query are probably the worst if we do not find them early...

But without such checks and in cases when Galera or async replication is not involved and we do not know for sure that there is incompatible change introduced that is NOT fixed by mysql_upgrade, upgrading step by step may, indeed, be considered mostly a waste of time.

I do not trust statements, even if they come from Monty himself, without checking them. So I tried to search MariaDB and MySQL bugs databases for still active bugs that happens when major version is skipped (but mysql_upgrade from the target final version is properly executed and returned no errors). I found no such bugs so far.

Next steps before I agree with the idea of skipping major versions as a valid and supported way for some cases (when proper applications testing is not possible in the process for whatever reason and no replication and HA expectations are directly involved) are to try such upgrades for my testing instances and check what mysql_upgrade really does.

Today, for the purpose of this blog post, I tried (again, as I did such things in the past) to upgrade MySQL 5.5.54 I have at hand among numerous other versions directly to MariaDB 10.4.13 (current from GitHub), and check what happens. This is a very basic test and my hope, honestly, was that it may show some problem, as mysql.user implementation details, for example, are very different.

So, I started with initializing new 5.5 data directory (to clean up 10.4 databases I had there based on previous upgrade tests with more real data):

As you can see, I added some data and checked that there are no mysql.innodb% tables (that store InnoDB persistent statistics) in 5.5.54. Then I proceeded to shutdown (without setting innodb_fast_shutdown = 0 explicitly, a mistake that could cost me a lot of troubles if the server was under load) and started 10.4.13 with the same datadir. As simple as that:

openxs@ao756:~/dbs/maria10.4$ bin/mysql -uroot test --socket=/tmp/mariadb.sockReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor. Commands end with ; or \g.Your MariaDB connection id is 8Server version: 10.4.13-MariaDB-log Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> show tables from mysql like 'innodb%';Empty set (0,001 sec)

Note that I just started 10.4 with default settings (and general log enables, so that I know what SQL statements were executed), and there is no obvious problem to start. I can even read the data from the table, even though InnoDB persistent statistics tables do not exist.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select * from t1;ERROR 1046 (3D000): No database selectedMariaDB [(none)]> use testReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A

I see just notes, no errors or warnings above after the restart. It means I can continue with applications testing and consider upgrade successful so far. Do NOT assume you are done with upgrade until you see such a clean startup messages log!

Connection 8 above was my user session. Starting from 9, we see queries executed by mysql_upgrade (and maybe other utilities it may call, more on them in the next part.. We see thjat it gets the datadir and current serverr version, and in all cases by default it disables writing changes to the binary log or replicating them to other Galera nodes, if any. Let me continue:

So, one way ot the other, with direct SQL or with prepared statements etc, either by changing structure multiple times or by adding missing tables we should end up with the content MariaDB 10.4 really needed.

That is, we check every table in every other database for upgrade, so some finalr steps and eventually execute FLUSH PRIVILEGES.

Theoretically we can continue to work then, as we did in connection 24, but the only reasonable action when mysql_upgrade was executed without errors would be to shut down and restart, as we actually did.

After restart we should have 10.4 up, running and ready to use, with all new features, colum,ns and tables avaiable and existing data upgraded to 10.4 "expectations".

Sunset of the step by step major MariaDB major versions upgrade era...

To summarize:

Running mysql_upgrade after any major version upgrade is a must. MariaDB may work for years without that, but you may end up with all kinds of troubles, including bad performance for queries due to missing/non-usable statistcis etc. Do not ignore related messages in the error log!

I was not able to find any still active bug reports about simple cases when direct upgrade skipping some major version(s) fails while the same upgrade done step by step does not fail.

Missing tables etc are really created at later stages of the process, no matter what was the version we upgrade from. In SQL code at least I see no checks for what the older version was. Looks like SQL is executed no matter if we get the error, and we just continue to the next statement that is going to fix the problem, if any. Something similar happens to applying binary log DML events from the earlier position than needed - if we ignore errors and apply them all, eventually we should end up in consistent state.

The SQL code executed, with checks etc, is complex enough. mysql_upgrade tries to deal with many details and fix many things, so one can not exclude bugs in the process withouth extensive testing.

It seems to be safe to run mysql_upgrade multiple times even if it was already run and has nothing to do. Some waste of time and no other harm expected. SQL is written that way, with all that IF NOT EXISTS and other checks.

Assuming that CHECK TABLE ... FOR UPGRADE from MariaDB 10.4 is able to find and fix all problems in tables originating from MariaDB or MySQL version x.y.z so that the result is usable, the process of direct upgrade skipping intermediate major versions may even work well, as Monty stated, with a set of assumptions and restrictions presented above. One still has to check the code and bugs to be sure about that part.