On 09/30/2011 01:31 PM, Jan Kiszka wrote:
> This is conceptually cleaner and will allow us to drop the nographic> timer. Moreover, it will be mandatory to fully exploit future per-device> coalesced MMIO rings.>
Appears to break winxp installation - the guest enters an infinite
bitblt loop. Trying to find out why.

On 2011-10-18 16:08, Avi Kivity wrote:
> On 10/18/2011 04:05 PM, Jan Kiszka wrote:>> On 2011-10-18 16:00, Avi Kivity wrote:>>> On 09/30/2011 01:31 PM, Jan Kiszka wrote:>>>> This is conceptually cleaner and will allow us to drop the nographic>>>> timer. Moreover, it will be mandatory to fully exploit future per-device>>>> coalesced MMIO rings.>>>>>>>>>> Appears to break winxp installation - the guest enters an infinite>>> bitblt loop. Trying to find out why.>>>> Hmm, maybe there are side effects in certain modes that actually>> disallow coalescing.>>> > That's true for sure, but flushing the buffer should never be wrong.
Err, you mean we are not flushing "too often"? I was under the
impression winxp is missing our periodic flushes. Do things work again
when you do not flush at all?
Jan

On 10/18/2011 04:10 PM, Jan Kiszka wrote:
> On 2011-10-18 16:08, Avi Kivity wrote:> > On 10/18/2011 04:05 PM, Jan Kiszka wrote:> >> On 2011-10-18 16:00, Avi Kivity wrote:> >>> On 09/30/2011 01:31 PM, Jan Kiszka wrote:> >>>> This is conceptually cleaner and will allow us to drop the nographic> >>>> timer. Moreover, it will be mandatory to fully exploit future per-device> >>>> coalesced MMIO rings.> >>>>> >>>> >>> Appears to break winxp installation - the guest enters an infinite> >>> bitblt loop. Trying to find out why.> >>> >> Hmm, maybe there are side effects in certain modes that actually> >> disallow coalescing.> >>> > > > That's true for sure, but flushing the buffer should never be wrong.>> Err, you mean we are not flushing "too often"?
No, I don't know what the exact problem is. What I mean is that an
extra flush should never hurt; a missing flush degrades the user
experience but shouldn't cause the infinite loops I'm seeing.
> I was under the> impression winxp is missing our periodic flushes. Do things work again> when you do not flush at all?
This takes a while to reproduce, let me talk to gdb for a bit.

On 10/18/2011 04:30 PM, Avi Kivity wrote:
> This takes a while to reproduce, let me talk to gdb for a bit.>
a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does
a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which
does vga_update_display(), which calls
qemu_flush_coalesced_mmio_buffer(), which is not reentrant.
It's easy to make qemu_flush_coalesced_mmio_buffer reentrant:
if (s->coalesced_flush_in_progress) {
return;
}
it isn't very pretty and is also a lie. Other ideas?
I'll probably commit this soon to avoid the regression, to be replaced
by a better fix when we find it.

On 2011-10-18 18:40, Avi Kivity wrote:
> On 10/18/2011 04:30 PM, Avi Kivity wrote:>> This takes a while to reproduce, let me talk to gdb for a bit.>>> > a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does> a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which
Why does it have to do vga_hw_update? Why can't it set some flag for the
next requested screen update or so? Just thinking, haven't looked at the
code yet.
Do you think that only cirrus is affected by this pattern?
> does vga_update_display(), which calls> qemu_flush_coalesced_mmio_buffer(), which is not reentrant.> > It's easy to make qemu_flush_coalesced_mmio_buffer reentrant:> > if (s->coalesced_flush_in_progress) {> return;> }> > it isn't very pretty and is also a lie. Other ideas?> > I'll probably commit this soon to avoid the regression, to be replaced> by a better fix when we find it.>
Agreed. Unless we can avoid that recursion at devices level, there is
likely no alternative.
Jan

On 10/18/2011 06:49 PM, Jan Kiszka wrote:
> On 2011-10-18 18:40, Avi Kivity wrote:> > On 10/18/2011 04:30 PM, Avi Kivity wrote:> >> This takes a while to reproduce, let me talk to gdb for a bit.> >>> > > > a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does> > a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which>> Why does it have to do vga_hw_update? Why can't it set some flag for the> next requested screen update or so? Just thinking, haven't looked at the> code yet.
Maybe it's a remnant from the days where it asked the host hardware to
do the blt.
> Do you think that only cirrus is affected by this pattern?
It's also possible for hotunplug:
- hotunplug
- unregister coalesced regions
- flush mmios
- call back into same device

On Tue, Oct 18, 2011 at 06:49:57PM +0200, Jan Kiszka wrote:
> On 2011-10-18 18:40, Avi Kivity wrote:> > On 10/18/2011 04:30 PM, Avi Kivity wrote:> >> This takes a while to reproduce, let me talk to gdb for a bit.> >>> > > > a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does> > a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which> > Why does it have to do vga_hw_update? Why can't it set some flag for the> next requested screen update or so? Just thinking, haven't looked at the> code yet.
bottomhalf?
> > Do you think that only cirrus is affected by this pattern?> > > does vga_update_display(), which calls> > qemu_flush_coalesced_mmio_buffer(), which is not reentrant.> > > > It's easy to make qemu_flush_coalesced_mmio_buffer reentrant:> > > > if (s->coalesced_flush_in_progress) {> > return;> > }> > > > it isn't very pretty and is also a lie. Other ideas?> > > > I'll probably commit this soon to avoid the regression, to be replaced> > by a better fix when we find it.> > > > Agreed. Unless we can avoid that recursion at devices level, there is> likely no alternative.> > Jan> > -- > Siemens AG, Corporate Technology, CT T DE IT 1> Corporate Competence Center Embedded Linux>

On 2011-10-18 19:34, Avi Kivity wrote:
> On 10/18/2011 06:49 PM, Jan Kiszka wrote:>> On 2011-10-18 18:40, Avi Kivity wrote:>>> On 10/18/2011 04:30 PM, Avi Kivity wrote:>>>> This takes a while to reproduce, let me talk to gdb for a bit.>>>>>>>>>> a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does>>> a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which>>>> Why does it have to do vga_hw_update? Why can't it set some flag for the>> next requested screen update or so? Just thinking, haven't looked at the>> code yet.> > Maybe it's a remnant from the days where it asked the host hardware to> do the blt.
If it's no longer needed - drop it? Already for other reasons like
efficiency.
> >> Do you think that only cirrus is affected by this pattern?> > It's also possible for hotunplug:> > - hotunplug> - unregister coalesced regions> - flush mmios> - call back into same device
Which device triggers hotunplug via a coalesced mmio region?
Anyway, if we want to avoid other surprises like that, better make
kvm_flush_coalesced_mmio_buffer reentrance-safe. If we think that this
remains an odd scenario, issue a warning to the console that some device
may require fixing.
Jan

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 10/18/2011 09:50 PM, Jan Kiszka wrote:
> On 2011-10-18 19:34, Avi Kivity wrote:> > On 10/18/2011 06:49 PM, Jan Kiszka wrote:> >> On 2011-10-18 18:40, Avi Kivity wrote:> >>> On 10/18/2011 04:30 PM, Avi Kivity wrote:> >>>> This takes a while to reproduce, let me talk to gdb for a bit.> >>>>> >>>> >>> a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does> >>> a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which> >>> >> Why does it have to do vga_hw_update? Why can't it set some flag for the> >> next requested screen update or so? Just thinking, haven't looked at the> >> code yet.> >> > Maybe it's a remnant from the days where it asked the host hardware to> > do the blt.>> If it's no longer needed - drop it? Already for other reasons like> efficiency.
I think it actually is needed - it calls qemu_console_copy() to do the
copy. Which incidentally means the the coalesced flush, had it worked,
would be a bug: it would bring pending mmio writes in front of a
currently executing bitblt. I don't think we can regard my hack as a
fix for that. Maybe we need to revert the original patch. Or make sure
the flush only happens from the display thread.
>>> >> >> Do you think that only cirrus is affected by this pattern?> >> > It's also possible for hotunplug:> >> > - hotunplug> > - unregister coalesced regions> > - flush mmios> > - call back into same device>> Which device triggers hotunplug via a coalesced mmio region?
Er, none.
> Anyway, if we want to avoid other surprises like that, better make> kvm_flush_coalesced_mmio_buffer reentrance-safe. If we think that this> remains an odd scenario, issue a warning to the console that some device> may require fixing.>
- --
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBAgAGBQJOnpKTAAoJEI7yEDeUysxlWjkP/3KubuwPQXYV6Fh8EjLXYSNG
ClxiqOvGIy8lHqdpZON2iuv5RiFcReeUZlFVXhRcjJ28tSb2ZYh8qxb/mRE82U8p
1sPS+kEH8k+lvzF38LjUc9XwWdjLLPiXrm3xWX/3uGvotbMKezS+2WGqc5hN7l8U
bc1pKLXQJ8YTIPq1seMv+ncVUcS3yaYSHMagkwjnQ4MDo8mhyPadfkkyv2BBLMM6
lTI9w5QvZ6JoWAHy/7KEEqyzs06ssfXOc/rSQLcHXn0S4CyVsmuGu3B/grHNqKIp
8OE0gZgDcA00E+YvXdOUGPMJ+XHEn/BCH2vrRf1TAlWq+zRPwXTrY3SZbauuBV/+
x0991xE2fMPF3o03OybwqnbQFrqtPhhYKXu9Dt/mBeITOkcvEVJjnO6ooCeqirmA
GrA26ls0I5+oZxfC3qmwJnO/WmGhZCEBS4YAav/4o+t1Ae2bjj/A23kcW/W+RUW7
5I2WTcbhe/+zqtalJg68F//8PSWmCnF4njxdHR3sRyhkzuDS1Ue99bGi0eugkYrM
71q0r17SnOK8Mo7tPn4FP0eeSEpTnxzS5vScf60IV0p3wIzgTPqeDgs+73v6AzO1
Yu3efGmh2q+UFVgUOhkDiFkPoQeaUSnpNEhpwBDGctGjqMyKpildkZB6DFY/+hle
9+Id+qSPeQNzoO62bfGi
=n5PD
-----END PGP SIGNATURE-----

On 2011-10-19 11:04, Avi Kivity wrote:
> > On 10/18/2011 09:50 PM, Jan Kiszka wrote:>> On 2011-10-18 19:34, Avi Kivity wrote:>>> On 10/18/2011 06:49 PM, Jan Kiszka wrote:>>>> On 2011-10-18 18:40, Avi Kivity wrote:>>>>> On 10/18/2011 04:30 PM, Avi Kivity wrote:>>>>>> This takes a while to reproduce, let me talk to gdb for a bit.>>>>>>>>>>>>>>>> a vcpu exit causes kvm_flush_coalesced_mmio_buffer() to run, which does>>>>> a bitblt, which is cirrus_do_copy(), which goes to vga_hw_update, which>>>>>>>> Why does it have to do vga_hw_update? Why can't it set some flag for the>>>> next requested screen update or so? Just thinking, haven't looked at the>>>> code yet.>>>>>> Maybe it's a remnant from the days where it asked the host hardware to>>> do the blt.> >> If it's no longer needed - drop it? Already for other reasons like>> efficiency.> > I think it actually is needed - it calls qemu_console_copy() to do the> copy. Which incidentally means the the coalesced flush, had it worked,> would be a bug: it would bring pending mmio writes in front of a> currently executing bitblt. I don't think we can regard my hack as a> fix for that. Maybe we need to revert the original patch. Or make sure> the flush only happens from the display thread.
I hope we can avoid the old scheme as it hurts when trying to make
progress /wrt scalability. Will have a look if we can avoid the
recursion in some reasonable way at device level here.
Jan