*Re: [PATCH v8 74/74] cputlb: queue async flush jobs without the BQL
2020-05-18 13:46 ` Robert Foley@ 2020-05-20 4:46 ` Emilio G. Cota
2020-05-20 15:01 ` Robert Foley0 siblings, 1 reply; 100+ messages in thread
From: Emilio G. Cota @ 2020-05-20 4:46 UTC (permalink / raw)
To: Robert Foley
Cc: Richard Henderson, Alex Bennée, QEMU Developers, Peter Puhov
On Mon, May 18, 2020 at 09:46:36 -0400, Robert Foley wrote:
> We re-ran the numbers with the latest re-based series.
>
> We used an aarch64 ubuntu VM image with a host CPU:
> Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 2 CPUs, 10 cores/CPU,
> 20 Threads/CPU. 40 cores total.
>
> For the bare hardware and kvm tests (first chart) the host CPU was:
> HiSilicon 1620 CPU 2600 Mhz, 2 CPUs, 64 Cores per CPU, 128 CPUs total.
>
> First, we ran a test of building the kernel in the VM.
> We did not see any major improvements nor major regressions.
> We show the results of the Speedup of building the kernel
> on bare hardware compared with kvm and QEMU (both the baseline and cpu locks).
>
>
> Speedup vs a single thread for kernel build
>
> 40 +----------------------------------------------------------------------+
> | + + + + + + ** |
> | bare hardwar********* |
> | kvm ####### |
> 35 |-+ baseline $$$$$$$-|
> | *cpu lock %%%%%%% |
> | *** |
> | ** |
> 30 |-+ *** +-|
> | *** |
> | *** |
> | ** |
> 25 |-+ *** +-|
> | *** |
> | ** |
> | ** |
> 20 |-+ ** +-|
> | ** ######### |
> | ** ################ |
> | ** ########## |
> | ** ### |
> 15 |-+ * #### +-|
> | ** ### |
> | * ### |
> | * ### |
> 10 |-+ **### +-|
> | *## |
> | ## $$$$$$$$$$$$$$$$ |
> | #$$$$$%%%%%%%%%%%%%%%%%%%% |
> 5 |-+ $%%%%%% %%%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$% +-|
> | %% % |
> | %% |
> |% + + + + + + |
> 0 +----------------------------------------------------------------------+
> 0 10 20 30 40 50 60 70
> Guest vCPUs
>
>
> After seeing these results and the scaling limits inherent in the build itself,
> we decided to run a test which might show the scaling improvements clearer.
Thanks for doing these tests. I know from experience that benchmarking
is hard and incredibly time consuming, so please do not be discouraged by
my comments below.
A couple of points:
1. I am not familiar with aarch64 KVM but I'd expect it to scale almost
like the native run. Are you assigning enough RAM to the guest? Also,
it can help to run the kernel build in a ramfs in the guest.
2. The build itself does not seem to impose a scaling limit, since
it scales very well when run natively (per-thread I presume aarch64 TCG is
still slower than native, even if TCG is run on a faster x86 machine).
The limit here is probably aarch64 TCG. In particular, last time I
checked aarch64 TCG has room for improvement scalability-wise handling
interrupts and some TLB operations; this is likely to explain why we
see no benefit with per-CPU locks, i.e. the bottleneck is elsewhere.
This can be confirmed with the sync profiler.
IIRC I originally used ppc64 for this test because ppc64 TCG does not
have any other big bottlenecks scalability-wise. I just checked but
unfortunately I can't find the ppc64 image I used :( What I can offer
is the script I used to run these benchmarks; see the appended.
Thanks,
Emilio
---
#!/bin/bash
set -eu
# path to host files
MYHOME=/local/home/cota/src
# guest image
QEMU_INST_PATH=$MYHOME/qemu-inst
IMG=$MYHOME/qemu/img/ppc64/ubuntu.qcow2
ARCH=ppc64
COMMON_ARGS="-M pseries -nodefaults \
-hda $IMG -nographic -serial stdio \
-net nic -net user,hostfwd=tcp::2222-:22 \
-m 48G"
# path to this script's directory, where .txt output will be copied
# from the guest.
QELT=$MYHOME/qelt
HOST_PATH=$QELT/fig/kcomp
# The guest must be able to SSH to the HOST without entering a password.
# The way I set this up is to have a passwordless SSH key in the guest's
# root user, and then copy that key's public key to the host.
# I used the root user because the guest runs on bootup (as root) a
# script that scp's run-guest.sh (see below) from the host, then executes it.
# This is done via a tiny script in the guest invoked from systemd once
# boot-up has completed.
HOST=foo@bar.edu
# This is a script in the host to use an appropriate cpumask to
# use cores in the same socket if possible.
# See https://github.com/cota/cputopology-perl
CPUTOPO=$MYHOME/cputopology-perl
# For each run we create this file that then the guest will SCP
# and execute. It is a quick and dirty way of passing arguments to the guest.
create_file () {
TAG=$1
CORES=$2
NAME=$ARCH.$TAG-$CORES.txt
echo '#!/bin/bash' > run-guest.sh
echo 'cp -r /home/cota/linux-4.18-rc7 /tmp2/linux' >> run-guest.sh
echo "cd /tmp2/linux" >> run-guest.sh
echo "{ time make -j $CORES vmlinux >/dev/null; } 2>>/home/cota/$NAME" >> run-guest.sh
# Output with execution time is then scp'ed to the host.
echo "ssh $HOST 'cat >> $HOST_PATH/$NAME' < /home/cota/$NAME" >> run-guest.sh
echo "poweroff" >> run-guest.sh
}
# Change here THREADS and also the TAGS that point to different QEMU installations.
for THREADS in 64 32 16; do
for TAG in cpu-exclusive-work cputlb-no-bql per-cpu-lock cpu-has-work baseline; do
QEMU=$QEMU_INST_PATH/$TAG/bin/qemu-system-$ARCH
CPUMASK=$($CPUTOPO/list.pl --policy=compact-smt $THREADS)
create_file $TAG $THREADS
time taskset -c $CPUMASK $QEMU $COMMON_ARGS -smp $THREADS
done
done
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 74/74] cputlb: queue async flush jobs without the BQL
2020-05-20 4:46 ` Emilio G. Cota@ 2020-05-20 15:01 ` Robert Foley
2020-05-21 14:17 ` Robert Foley0 siblings, 1 reply; 100+ messages in thread
From: Robert Foley @ 2020-05-20 15:01 UTC (permalink / raw)
To: Emilio G. Cota
Cc: Richard Henderson, Alex Bennée, QEMU Developers, Peter Puhov
On Wed, 20 May 2020 at 00:46, Emilio G. Cota <cota@braap.org> wrote:
>
> On Mon, May 18, 2020 at 09:46:36 -0400, Robert Foley wrote:
>
> Thanks for doing these tests. I know from experience that benchmarking
> is hard and incredibly time consuming, so please do not be discouraged by
> my comments below.
>
Hi,
Thanks for all the comments, and for including the script!
These are all very helpful.
We will work to replicate these results using a PPC VM,
and will re-post them here.
Thanks & Regards,
-Rob
> A couple of points:
>
> 1. I am not familiar with aarch64 KVM but I'd expect it to scale almost
> like the native run. Are you assigning enough RAM to the guest? Also,
> it can help to run the kernel build in a ramfs in the guest.
> 2. The build itself does not seem to impose a scaling limit, since
> it scales very well when run natively (per-thread I presume aarch64 TCG is
> still slower than native, even if TCG is run on a faster x86 machine).
> The limit here is probably aarch64 TCG. In particular, last time I
> checked aarch64 TCG has room for improvement scalability-wise handling
> interrupts and some TLB operations; this is likely to explain why we
> see no benefit with per-CPU locks, i.e. the bottleneck is elsewhere.
> This can be confirmed with the sync profiler.
>
> IIRC I originally used ppc64 for this test because ppc64 TCG does not
> have any other big bottlenecks scalability-wise. I just checked but
> unfortunately I can't find the ppc64 image I used :( What I can offer
> is the script I used to run these benchmarks; see the appended.
>
> Thanks,
> Emilio
>
> ---
> #!/bin/bash
>
> set -eu
>
> # path to host files
> MYHOME=/local/home/cota/src
>
> # guest image
> QEMU_INST_PATH=$MYHOME/qemu-inst
> IMG=$MYHOME/qemu/img/ppc64/ubuntu.qcow2
>
> ARCH=ppc64
> COMMON_ARGS="-M pseries -nodefaults \
> -hda $IMG -nographic -serial stdio \
> -net nic -net user,hostfwd=tcp::2222-:22 \
> -m 48G"
>
> # path to this script's directory, where .txt output will be copied
> # from the guest.
> QELT=$MYHOME/qelt
> HOST_PATH=$QELT/fig/kcomp
>
> # The guest must be able to SSH to the HOST without entering a password.
> # The way I set this up is to have a passwordless SSH key in the guest's
> # root user, and then copy that key's public key to the host.
> # I used the root user because the guest runs on bootup (as root) a
> # script that scp's run-guest.sh (see below) from the host, then executes it.
> # This is done via a tiny script in the guest invoked from systemd once
> # boot-up has completed.
> HOST=foo@bar.edu
>
> # This is a script in the host to use an appropriate cpumask to
> # use cores in the same socket if possible.
> # See https://github.com/cota/cputopology-perl
> CPUTOPO=$MYHOME/cputopology-perl
>
> # For each run we create this file that then the guest will SCP
> # and execute. It is a quick and dirty way of passing arguments to the guest.
> create_file () {
> TAG=$1
> CORES=$2
> NAME=$ARCH.$TAG-$CORES.txt
>
> echo '#!/bin/bash' > run-guest.sh
> echo 'cp -r /home/cota/linux-4.18-rc7 /tmp2/linux' >> run-guest.sh
> echo "cd /tmp2/linux" >> run-guest.sh
> echo "{ time make -j $CORES vmlinux >/dev/null; } 2>>/home/cota/$NAME" >> run-guest.sh
> # Output with execution time is then scp'ed to the host.
> echo "ssh $HOST 'cat >> $HOST_PATH/$NAME' < /home/cota/$NAME" >> run-guest.sh
> echo "poweroff" >> run-guest.sh
> }
>
> # Change here THREADS and also the TAGS that point to different QEMU installations.
> for THREADS in 64 32 16; do
> for TAG in cpu-exclusive-work cputlb-no-bql per-cpu-lock cpu-has-work baseline; do
> QEMU=$QEMU_INST_PATH/$TAG/bin/qemu-system-$ARCH
> CPUMASK=$($CPUTOPO/list.pl --policy=compact-smt $THREADS)
>
> create_file $TAG $THREADS
> time taskset -c $CPUMASK $QEMU $COMMON_ARGS -smp $THREADS
> done
> done
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-26 22:58 ` [PATCH v8 00/74] per-CPU locks Aleksandar Markovic
@ 2020-03-27 9:39 ` Alex Bennée
2020-03-27 9:50 ` Aleksandar Markovic0 siblings, 1 reply; 100+ messages in thread
From: Alex Bennée @ 2020-03-27 9:39 UTC (permalink / raw)
To: Aleksandar Markovic
Cc: peter.puhov, Richard Henderson, Robert Foley, QEMU Developers
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes:
> 21:37 Čet, 26.03.2020. Robert Foley <robert.foley@linaro.org> је написао/ла:
>>
>> V7: https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg00786.html
>>
>> This is a continuation of the series created by Emilio Cota.
>> We are picking up this patch set with the goal to apply
>> any fixes or updates needed to get this accepted.
>>
>
> Thank for this work, Robert.
>
> However, I just hope you don't intend to request integrating the series in
> 5.0. The right timing for such wide-influencing patch is at the begining of
> dev cycle, not really at the end of (5.0) cycle, IMHO.
It's not marked for 5.0 - I don't think all patch activity on the list
has to stop during softfreeze. I don't think there is any danger of it
getting merged and early visibility has already generated useful
feedback and discussion.
--
Alex Bennée
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-27 10:24 ` Aleksandar Markovic@ 2020-03-27 17:21 ` Robert Foley0 siblings, 0 replies; 100+ messages in thread
From: Robert Foley @ 2020-03-27 17:21 UTC (permalink / raw)
To: Aleksandar Markovic
Cc: Richard Henderson, Alex Bennée, QEMU Developers, peter.puhov
On Fri, 27 Mar 2020 at 06:24, Aleksandar Markovic
<aleksandar.qemu.devel@gmail.com> wrote:
> >> >>
> >> >> This is a continuation of the series created by Emilio Cota.
> >> >> We are picking up this patch set with the goal to apply
> >> >> any fixes or updates needed to get this accepted.
> >> >>
> >> >
> >> > Thank for this work, Robert.
> >> >
> >> > However, I just hope you don't intend to request integrating the series in
> >> > 5.0. The right timing for such wide-influencing patch is at the begining of
> >> > dev cycle, not really at the end of (5.0) cycle, IMHO.
> >>
> >> It's not marked for 5.0 - I don't think all patch activity on the list
> >> has to stop during softfreeze. I don't think there is any danger of it
> >> getting merged and early visibility has already generated useful
> >> feedback and discussion.
> >
> >
> > OK, nobody ever said we can
>
> Obviously, I meant here "cannot", not "can". Everbody is allowed to do any experimentation and evaluation of the series at any time - of course. :)
We do not have any particular release in mind, but we are not
intending this release for sure. It is a good point though, that we
probably should have mentioned this in our cover letter, given the
timing in the current release cycle. We'll remember that in the
future. :) We're just looking to get this out there now to get some
feedback and hopefully advance the series forward to the point that it
can be included in a release in the future.
Thanks & Regards,
-Rob
>
> > examine, discuss and test the series, but I remain thinking that this series arrives too late for considering for 5.0.
> >
> > Aleksandar
> >
> >
> >>
> >> --
> >> Alex Bennée
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-26 19:30 [PATCH v8 00/74] per-CPU locks Robert Foley
` (74 preceding siblings ...)
2020-03-26 22:58 ` [PATCH v8 00/74] per-CPU locks Aleksandar Markovic
@ 2020-03-27 5:14 ` Emilio G. Cota
2020-03-27 10:59 ` Philippe Mathieu-Daudé
` (2 more replies)
2020-05-12 16:29 ` Alex Bennée76 siblings, 3 replies; 100+ messages in thread
From: Emilio G. Cota @ 2020-03-27 5:14 UTC (permalink / raw)
To: Robert Foley
Cc: Peter Maydell, richard.henderson, qemu-devel, peter.puhov,
Paolo Bonzini, alex.bennee
(Apologies if I missed some Cc's; I was not Cc'ed in patch 0
so I'm blindly crafting a reply.)
On Thu, Mar 26, 2020 at 15:30:43 -0400, Robert Foley wrote:
> This is a continuation of the series created by Emilio Cota.
> We are picking up this patch set with the goal to apply
> any fixes or updates needed to get this accepted.
Thanks for picking this up!
> Listed below are the changes for this version of the patch,
> aside from the merge related changes.
>
> Changes for V8:
> - Fixed issue where in rr mode we could destroy the BQL twice.
I remember doing little to no testing in record-replay mode, so
there should be more bugs hiding in there :-)
> - Found/fixed bug that had been hit in testing previously during
> the last consideration of this patch.
> We reproduced the issue hit in the qtest: bios-tables-test.
> The issue was introduced by dropping the BQL, and found us
> (very rarely) missing the condition variable wakeup in
> qemu_tcg_rr_cpu_thread_fn().
Aah, this one:
https://patchwork.kernel.org/patch/10838149/#22516931
How did you identify the problem? Was it code inspection or using a tool
like rr? I remember this being hard to reproduce reliably.
On a related note, I've done some work to get QEMU-system to work
under thread sanitizer, since tsan now supports our longjmp-based
coroutines (hurrah!). My idea was to integrate tsan in QEMU (i.e.
bring tsan warnings to 0) before (re)trying to merge the
per-CPU lock patchset; this would minimize the potential for
regressions, which from my personal viewpoint seems like a reasonable
thing to do especially now that I have little time to work on QEMU.
If there's interest in doing the tsan work first, then I'd be
happy to send to the list as soon as this weekend the changes that
I have so far [1].
Thanks,
Emilio
[1] WIP branch: https://github.com/cota/qemu/commits/tsan^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-27 5:14 ` Emilio G. Cota@ 2020-03-27 10:59 ` Philippe Mathieu-Daudé
2020-03-30 8:57 ` Stefan Hajnoczi
2020-03-27 18:23 ` Alex Bennée
2020-03-27 18:30 ` Robert Foley2 siblings, 1 reply; 100+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-03-27 10:59 UTC (permalink / raw)
To: Emilio G. Cota, Robert Foley, Marc-André Lureau, Stefan Hajnoczi
Cc: Peter Maydell, richard.henderson, qemu-devel, peter.puhov,
Paolo Bonzini, alex.bennee
On 3/27/20 6:14 AM, Emilio G. Cota wrote:
> (Apologies if I missed some Cc's; I was not Cc'ed in patch 0
> so I'm blindly crafting a reply.)
>
> On Thu, Mar 26, 2020 at 15:30:43 -0400, Robert Foley wrote:
>> This is a continuation of the series created by Emilio Cota.
>> We are picking up this patch set with the goal to apply
>> any fixes or updates needed to get this accepted.
>
> Thanks for picking this up!
>
>> Listed below are the changes for this version of the patch,
>> aside from the merge related changes.
>>
>> Changes for V8:
>> - Fixed issue where in rr mode we could destroy the BQL twice.
>
> I remember doing little to no testing in record-replay mode, so
> there should be more bugs hiding in there :-)
>
>> - Found/fixed bug that had been hit in testing previously during
>> the last consideration of this patch.
>> We reproduced the issue hit in the qtest: bios-tables-test.
>> The issue was introduced by dropping the BQL, and found us
>> (very rarely) missing the condition variable wakeup in
>> qemu_tcg_rr_cpu_thread_fn().
>
> Aah, this one:
> https://patchwork.kernel.org/patch/10838149/#22516931
> How did you identify the problem? Was it code inspection or using a tool
> like rr? I remember this being hard to reproduce reliably.
>
> On a related note, I've done some work to get QEMU-system to work
> under thread sanitizer, since tsan now supports our longjmp-based
> coroutines (hurrah!). My idea was to integrate tsan in QEMU (i.e.
> bring tsan warnings to 0) before (re)trying to merge the
> per-CPU lock patchset; this would minimize the potential for
> regressions, which from my personal viewpoint seems like a reasonable
> thing to do especially now that I have little time to work on QEMU.
>
> If there's interest in doing the tsan work first, then I'd be
> happy to send to the list as soon as this weekend the changes that
> I have so far [1].
I'm pretty sure Marc-André is interested (and also Stefan maybe), so
Cc'ing them.
>
> Thanks,
> Emilio
>
> [1] WIP branch: https://github.com/cota/qemu/commits/tsan
>
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-27 10:59 ` Philippe Mathieu-Daudé@ 2020-03-30 8:57 ` Stefan Hajnoczi0 siblings, 0 replies; 100+ messages in thread
From: Stefan Hajnoczi @ 2020-03-30 8:57 UTC (permalink / raw)
To: Philippe Mathieu-Daudé
Cc: Peter Maydell, Robert Foley, Paolo Bonzini, richard.henderson,
qemu-devel, Emilio G. Cota, peter.puhov, Marc-André Lureau,
alex.bennee
[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]
On Fri, Mar 27, 2020 at 11:59:37AM +0100, Philippe Mathieu-Daudé wrote:
> On 3/27/20 6:14 AM, Emilio G. Cota wrote:
> > (Apologies if I missed some Cc's; I was not Cc'ed in patch 0
> > so I'm blindly crafting a reply.)
> >
> > On Thu, Mar 26, 2020 at 15:30:43 -0400, Robert Foley wrote:
> > > This is a continuation of the series created by Emilio Cota.
> > > We are picking up this patch set with the goal to apply
> > > any fixes or updates needed to get this accepted.
> >
> > Thanks for picking this up!
> >
> > > Listed below are the changes for this version of the patch,
> > > aside from the merge related changes.
> > >
> > > Changes for V8:
> > > - Fixed issue where in rr mode we could destroy the BQL twice.
> >
> > I remember doing little to no testing in record-replay mode, so
> > there should be more bugs hiding in there :-)
> >
> > > - Found/fixed bug that had been hit in testing previously during
> > > the last consideration of this patch.
> > > We reproduced the issue hit in the qtest: bios-tables-test.
> > > The issue was introduced by dropping the BQL, and found us
> > > (very rarely) missing the condition variable wakeup in
> > > qemu_tcg_rr_cpu_thread_fn().
> >
> > Aah, this one:
> > https://patchwork.kernel.org/patch/10838149/#22516931
> > How did you identify the problem? Was it code inspection or using a tool
> > like rr? I remember this being hard to reproduce reliably.
> >
> > On a related note, I've done some work to get QEMU-system to work
> > under thread sanitizer, since tsan now supports our longjmp-based
> > coroutines (hurrah!). My idea was to integrate tsan in QEMU (i.e.
> > bring tsan warnings to 0) before (re)trying to merge the
> > per-CPU lock patchset; this would minimize the potential for
> > regressions, which from my personal viewpoint seems like a reasonable
> > thing to do especially now that I have little time to work on QEMU.
> >
> > If there's interest in doing the tsan work first, then I'd be
> > happy to send to the list as soon as this weekend the changes that
> > I have so far [1].
>
> I'm pretty sure Marc-André is interested (and also Stefan maybe), so Cc'ing
> them.
Yes, please! tsan would be another good tool to have.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-27 5:14 ` Emilio G. Cota
2020-03-27 10:59 ` Philippe Mathieu-Daudé@ 2020-03-27 18:23 ` Alex Bennée
2020-03-27 18:30 ` Robert Foley2 siblings, 0 replies; 100+ messages in thread
From: Alex Bennée @ 2020-03-27 18:23 UTC (permalink / raw)
To: Emilio G. Cota
Cc: Peter Maydell, Robert Foley, richard.henderson, qemu-devel,
Paolo Bonzini, peter.puhov
Emilio G. Cota <cota@braap.org> writes:
> (Apologies if I missed some Cc's; I was not Cc'ed in patch 0
> so I'm blindly crafting a reply.)
>
<snip>
>> Changes for V8:
>> - Fixed issue where in rr mode we could destroy the BQL twice.
>
> I remember doing little to no testing in record-replay mode, so
> there should be more bugs hiding in there :-)
Well we have two (very simple) rr tests in check-tcg now - so there is
that ;-)
> On a related note, I've done some work to get QEMU-system to work
> under thread sanitizer, since tsan now supports our longjmp-based
> coroutines (hurrah!).
When did that go in? Will I need a new toolchain and -ltsan to make it
work?
> My idea was to integrate tsan in QEMU (i.e.
> bring tsan warnings to 0) before (re)trying to merge the
> per-CPU lock patchset; this would minimize the potential for
> regressions, which from my personal viewpoint seems like a reasonable
> thing to do especially now that I have little time to work on QEMU.
>
> If there's interest in doing the tsan work first, then I'd be
> happy to send to the list as soon as this weekend the changes that
> I have so far [1].
Please do - I've been cleaning up some of the other things the sanitizer
has found and it certainly won't hurt to get thread sanitizer working
again.
--
Alex Bennée
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-27 5:14 ` Emilio G. Cota
2020-03-27 10:59 ` Philippe Mathieu-Daudé
2020-03-27 18:23 ` Alex Bennée@ 2020-03-27 18:30 ` Robert Foley2 siblings, 0 replies; 100+ messages in thread
From: Robert Foley @ 2020-03-27 18:30 UTC (permalink / raw)
To: Emilio G. Cota
Cc: Peter Maydell, Richard Henderson, QEMU Developers, Peter Puhov,
Paolo Bonzini, Alex Bennée
On Fri, 27 Mar 2020 at 01:14, Emilio G. Cota <cota@braap.org> wrote:
>
> (Apologies if I missed some Cc's; I was not Cc'ed in patch 0
> so I'm blindly crafting a reply.)
Sorry I forgot to including you in patch 0, my bad. Will be sure to
include you in the future.
> On Thu, Mar 26, 2020 at 15:30:43 -0400, Robert Foley wrote:
> > This is a continuation of the series created by Emilio Cota.
> > We are picking up this patch set with the goal to apply
> > any fixes or updates needed to get this accepted.
>
> Thanks for picking this up!
>
> > Listed below are the changes for this version of the patch,
> > aside from the merge related changes.
> >
> > Changes for V8:
> > - Fixed issue where in rr mode we could destroy the BQL twice.
>
> I remember doing little to no testing in record-replay mode, so
> there should be more bugs hiding in there :-)
Thanks for the tip! We will give record-replay some extra testing to
hopefully shake some things out. :)
>
> > - Found/fixed bug that had been hit in testing previously during
> > the last consideration of this patch.
> > We reproduced the issue hit in the qtest: bios-tables-test.
> > The issue was introduced by dropping the BQL, and found us
> > (very rarely) missing the condition variable wakeup in
> > qemu_tcg_rr_cpu_thread_fn().
>
> Aah, this one:
> https://patchwork.kernel.org/patch/10838149/#22516931
> How did you identify the problem? Was it code inspection or using a tool
> like rr? I remember this being hard to reproduce reliably.
Same here, it was hard to reproduce. I did try to use rr on some
shorter runs but no luck there.
We ran it overnight on one of our ARM servers and it eventually
reproduced after about 12 hours in a loop across all the
bios-table-test(s) (no rr). Never got it to reproduce on an x86
server. It was fairly consistent too on the same ARM host, it always
reproduced within 8-12 hrs or so, and we were able to reproduce it
several times.
Thanks & Regards,
-Rob
^permalinkrawreply [flat|nested] 100+ messages in thread

*Re: [PATCH v8 00/74] per-CPU locks
2020-03-26 19:30 [PATCH v8 00/74] per-CPU locks Robert Foley
` (75 preceding siblings ...)
2020-03-27 5:14 ` Emilio G. Cota@ 2020-05-12 16:29 ` Alex Bennée76 siblings, 0 replies; 100+ messages in thread
From: Alex Bennée @ 2020-05-12 16:29 UTC (permalink / raw)
To: Robert Foley; +Cc: peter.puhov, richard.henderson, qemu-devel
Robert Foley <robert.foley@linaro.org> writes:
> V7: https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg00786.html
>
> This is a continuation of the series created by Emilio Cota.
> We are picking up this patch set with the goal to apply
> any fixes or updates needed to get this accepted.
>
> Quoting an earlier patch in the series:
> "For context, the goal of this series is to substitute the BQL for the
> per-CPU locks in many places, notably the execution loop in cpus.c.
> This leads to better scalability for MTTCG, since CPUs don't have
> to acquire a contended global lock (the BQL) every time they
> stop executing code.
> See the last commit for some performance numbers."
Aside from some minor comments I think this series is pretty good to go
and would favour an early merging so we get plenty of time to shake out
any bugs.
I've been hammering this with my looped build test and everything seems
pretty stable. So for me have a:
Tested-by: Alex Bennée <alex.bennee@linaro.org>
for the series.
--
Alex Bennée
^permalinkrawreply [flat|nested] 100+ messages in thread