Re: [libvirt] libvirt + xen 3.2.1 oddities

From: "Daniel P. Berrange" <berrange redhat com>

To: Guido Günther <agx sigxcpu org>

Cc: libvir-list redhat com

Subject: Re: [libvirt] libvirt + xen 3.2.1 oddities

Date: Tue, 25 Nov 2008 11:39:57 +0000

On Fri, Nov 21, 2008 at 11:13:04PM +0100, Guido G?nther wrote:
> Hi,
> I just ran across these oddities when using a bit more libvirt+xen:
>
> 1.) virsh setmaxmem:
>
> On a running domain:
> # virsh setmaxmem domain 256000
> completes but virsh dumpxml as well as the config.sxp still shows the
> old amount of memory. Looks as the set_maxmem hypercall simply gets
> ignored. xm mem-max works as expected. Smells like a bug in the ioctl?
The setmaxmem API is not performance critical, so it sounds like we
should first try setting it via XenD, and use Hypervisor as the
fallback instead.
> 2.) virsh list:
>
> Sometimes (didn't find a pattern yet) when shutting down a running
> domains and restarting it I'm seeing:
>
> Id Name State
> ----------------------------------
> 0 Domain-0 running
> 2 foo idle
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> 7 bar idle
>
> Note that the number of errors the corresponds to the number of
> shutdowns. VirXen_getdomaininfolist returns 7 in the above case.
> virDomainLookupByID later on fails for these "additional" domains.
This is basically a XenD bug. What's happening is that the domain
has been shutdown, and got most of the way through cleanup, as far
as the hypervisor is concerned. But something is still hanging around
keeping the domain from being completely terminated. In this case
XenD takes the dubious approach of just pretending the domain does
not exist. So libvirt sees it exists in the hypervisor, but when
asking XenD for more data, it gets that error. This really really
sucks.
THere's not really much we can do about it when XenD is just plain
lieing about what exists. We explicitly don't ask XenD for the list
of domain IDs because it is incredibly slow, hence we use the HV.
The only idea I can think of is to ask XenStore for the list of
domain IDs. This is still dramatically faster than asking XenD,
but not quite as fast as the Hypervisor.
>
> 3.) virsh list: Duplicate domains:
>
> Id Name State
> ----------------------------------
> 0 Domain-0 running
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
2A> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> 14 bar no state
> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
> 16 bar idle
>
> Domain 14 can't be shut down (xm list only lists domain 16).
>
> Could be a similar problem as the above.
Yeha, this is almost certainly just another example of XenD not properly
cleaning up / destroying domains. If you still have a machine which
shows this behaviour, then I'd recommend trying this change to our Xen
impl
In xen_unified.c, find the method xenUnifiedListDomains and make it first
call xenStoreListDomains() and then fallback to trying HV & XenD drivers.
If we're lucky this will help....
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|