On Thu, 29 Mar 2012 11:53:02 -0500, Alan Cox <alan.l.cox at gmail.com> wrote:
>> Not so long ago, VMware implemented a clever scheme for reducing the
> overhead of virtualized interrupts that must be delivered by at least
> some
> (if not all) of their emulated storage controllers:
>>http://static.usenix.org/events/atc11/tech/techAbstracts.html#Ahmad>> Perhaps, there is a bad interaction between this scheme and FreeBSD's mpt
> driver.
>> Alan
If we assume mpt is the culprit how can I go about diagnosing this more
accurately? Is there something I should be looking for in vmstat -i? Too
many interrupts? Not enough? Rate too high or too low? Or is this
something that is much harder to track down because we're dealing with
emulated hardware?
If any BSD devs are interested in access to our environment I think we
could comply. I might even be able to get authorization to give you an
account on the most crash-prone server which doesn't have any sensitive
customer data on it. I think at this point we'd even be willing to pay
someone to look at a server in this state just so we (and hopefully
others) can benefit.... and hopefully we end up with a more reliable
FreeBSD-on-VMWare for everyone.
I know Doug mentioned running newer OS versions and that is definitely
tempting but because it's not 100% reproducible on demand it's hard to
prove it fixes it without waiting 6 months. We're fighting internally here
with "trust 9.0 fixes it" vs "jump back to 7.4 because we KNOW it doesn't
happen there". Having someone look at this and say "oh, yes, that's a
deficiency in mpt that appears to be fixed in the newer driver that was
MFC'd to 8-STABLE and you'll find in 8.3-RELEASE and 9.0-RELEASE" would be
more comforting.
Thanks to everyone for their time on this!