Comments

From: David Miller <davem@davemloft.net>
Date: Wed, 29 Apr 2009 15:04:08 -0700 (PDT)
> But we should be working around that properly. This check in> sun4v_irq_eoi() and sun4v_virq_eoi():> > if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))> return;> > Should prevent any problems due to that bug.
This is the commit that test comes from. I assume you guys
would check the commit logs for something like this, but just
in case...
commit 5a606b72a4309a656cd1a19ad137dc5557c4b8ea
Author: David S. Miller <davem@sunset.davemloft.net>
Date: Mon Jul 9 22:40:36 2007 -0700
[SPARC64]: Do not ACK an INO if it is disabled or inprogress.
This is also a partial workaround for a bug in the LDOM firmware which
double-transmits RX inos during high load. Without this, such an
event causes the kernel to loop forever in the interrupt call chain
ACK'ing but never actually running the IRQ handler (and thus clearing
the interrupt condition in the device).
There is still a bad potential effect when double INOs occur,
not covered by this changeset. Namely, if the INO is already on
the per-cpu INO vector list, we still blindly re-insert it and
thus we can end up losing interrupts already linked in after
it.
We could deal with that by traversing the list before insertion,
but that's too expensive for this edge case.
Signed-off-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

From: Chris Torek <chris.torek@windriver.com>
Date: Wed, 29 Apr 2009 18:06:57 -0600
> Yes, this one is in. I did (and still do) suspect a hypervisor> bug. I think in this case the problem is that this test is not> sufficent for the "two vnets on one vswitch" case, though,> because this results in the desc->status not having> IRQ_INPROGRESS set on the "other" vnet that gets a double> interrupt.
You can know if it's this hypervisor bug by simply updating
your hypervisor to the latest version available.
> I'm also wondering now if it has something to do with the code we> added that attempts to redistribute interrupts, so that in the case> of a double interrupt, the second one goes to a different CPU> than the CPU already handling the first one. Although we> should still see IRQ_INPROGRESS then...
If you're making local changes on that scale, it's irresponsible
to make mention of them when reporting a bug you want this
list to look at :-(
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html