Re: Dramatic mysqld slowness when running compiles in parallel

Hi,
Sorry to keep harping on this but the problem still exists in today's current.
In addition to the tests from a single processor Thinkpad T42
(see http://mail-index.netbsd.org/current-users/2008/07/17/msg003575.html)
I have now reproduced the problem on a two processor Dell PowerEdge 2650. On
the latter, with both processors, the issue much less severe but still very
noticeable if I increase the load. When I disable one processor the issue
shows up immediately.
With the NetBSD 5 branch approaching, it seems to me this issue would be worth
tracking down.
To reproduce the problem, install mysq5-server, create a simple test table as
described in my initial report, place ~200 (identical) insert statements into
the file 'sql-file' and run
time mysql test < sql-file
Repeat the test a few times while compiling, e.g., /usr/pkgsrc/shells/bash and
notice the slowdown by factors of one hundred and more.
Some additional observations since my last set of tests:
(1) I used time(1) to time mysqld while running the tests. The only
difference is that the real time is much larger in the "slow" case. User and
sys time is the same for both slow and fast execution. This is supported by
profiling results for mysqld where I see no difference between the slow and
the fast case.
(2) Here is another "ktruss -T -tA mysqld" segment from a particularly slow
insert (I have sorted the output by the time column):
15638 11 mysqld 1223051925.586724189 _lwp_kill(0xa, 0xe) called
15638 11 mysqld 1223051925.586729218 _lwp_kill(0xa, 0xe) = 0
15638 10 mysqld 1223051925.586734805 resume kernel
15638 10 mysqld 1223051925.586737320 __sigtimedwait(0xb6fffd88,
0xb6fffcc4, 0) = 0
15638 10 mysqld 1223051925.586741231 __sigprocmask14(0x3, 0x8570edc,
0xb6fffd30) called
15638 10 mysqld 1223051925.586742907 __sigprocmask14(0x3, 0x8570edc,
0xb6fffd30) = 0
15638 10 mysqld 1223051925.586745701 _lwp_park(0, 0, 0x8570eb8,
0x8570eb8) called
15638 10 mysqld 1223051925.586748494 stop kernel
(*** notice the large time gap here)
15638 7 mysqld 1223051926.126986804 resume kernel
15638 7 mysqld 1223051926.126990157 select(0, 0, 0, 0, 0xb87ff864) = 0
15638 7 mysqld 1223051926.126999935 gettimeofday(0xb87ff860, 0) called
15638 7 mysqld 1223051926.127002449 gettimeofday(0xb87ff860, 0) = 0
15638 7 mysqld 1223051926.127244943 select(0, 0, 0, 0, 0xb87ff864)
called
15638 7 mysqld 1223051926.127250251 stop kernel
15638 8 mysqld 1223051926.127255839 resume kernel
15638 8 mysqld 1223051926.127257794 select(0, 0, 0, 0, 0xb85fff64) = 0
15638 8 mysqld 1223051926.127262823 gettimeofday(0xb85fff60, 0) called
15638 8 mysqld 1223051926.127265617 gettimeofday(0xb85fff60, 0) = 0
15638 8 mysqld 1223051926.127813184 select(0, 0, 0, 0, 0xb85fff64)
called
15638 8 mysqld 1223051926.127815139 stop kernel
(*** and here)
15638 11 mysqld 1223051926.547139687 _lwp_unpark(0xa, 0x8570eb8) called
15638 11 mysqld 1223051926.547146392 _lwp_unpark(0xa, 0x8570eb8) = 0
The same part from a fast insert looks like:
4792 11 mysqld 1223050783.748332054 _lwp_kill(0xa, 0xe) called
4792 11 mysqld 1223050783.748335127 _lwp_kill(0xa, 0xe) = 0
4792 10 mysqld 1223050783.748340714 resume kernel
4792 10 mysqld 1223050783.748343229 __sigtimedwait(0xb6bffd88,
0xb6bffcc4, 0) = 0
4792 10 mysqld 1223050783.748347420 __sigprocmask14(0x3, 0x8570edc,
0xb6bffd30) called
4792 10 mysqld 1223050783.748349096 __sigprocmask14(0x3, 0x8570edc,
0xb6bffd30) = 0
4792 10 mysqld 1223050783.748351889 _lwp_park(0, 0, 0x8570eb8,
0x8570eb8) called
4792 10 mysqld 1223050783.748354124 stop kernel
(*** this is where the two time gaps occur in the slow case -- no gap here!)
4792 11 mysqld 1223050783.748895545 _lwp_unpark(0xa, 0x8570eb8) called
4792 11 mysqld 1223050783.748898060 _lwp_unpark(0xa, 0x8570eb8) = 0
My plan is to look at revivesa to see if it behaves similarly (but if somebody
beats me to it I would not mind).
I'd appreciate any suggestions, comments, and other help with this.
Thanks,
Sverre
PS The _lwp_unpark calls appear to come from the libpthread functions
pthread__mutex_unlock_slow or pthread__mutex_wakeup. I tried to trace the
call stack by recompiling libpthread with -g, attaching gdb to the mysqld
process, and setting breakpoints at the two functions. When I did that,
however, gdb would lock up after a couple of continues and I was not able to
get meaningful results.