Pallinux

26/02/13

Thanks
to the efforts of the oVirt devs it is now possible to virtualize an
oVirt node, adding another virtualization layer.

Obviously
the first step is to install oVirt Engine 3.2 on Fedora 18 thru the
official oVirt repositories.

Then
we add a Fedora 18 node – actually two in our case - with a
properly configured vdsm daemon, joined by the
vdsm-hook-nestedvt.noarch package which is actually the one that
makes this little black magic trick work. Don't forget to enable the
KVM nested virtualization itself by putting "options kvm-intel
nested=1" inside the /etc/modprobe.d/kvm-intel.conf file on the
node (that's for an intel arch, of course). Finally, have the node
join the engine.

We're
now ready to virtualize a shiny new node inside a V.M. which, in
turn, is virtualization capable. We'll let it join the very same
engine.

Here
are the main packages involved in this experiment:

Kernel:
3.7.9 – 201.fc18.x86_64

Libvirt:
libvirt-0.10.2.3-1.fc18

Vdsm:
vdsm-4.10.3-8.fc18

Vdsm-hook-nestedvt.noarch-4.10.3-8.fc18

This
is just a demo of the kind of power nested virtualization unlocks.

12/12/12

During the last couple of weeks I took some time to try out
the next step in virtualization tech: nested virtualization. I set
the target to add to our research infrastructure a server that was
capable of managing multiple layered virtualization with the aim to
learn something about the current status of KVM development regarding
this particular feature, while checking out the relative performance
levels and finding out if managing a host such as this and its
virtual siblings through our own oVirt implementation was a viable
option.

What are we talking about?

Nested
virtualization is, as the name implies, a feature that gives a
virtual host the ability to run a hypervisor, rendering it capable of
hosting one or more virtual machines on its own. To put it in a
simple, straight forward way: this is achieved by giving the virtual
host direct access to the cpu virtualization extensions (svm, vmx) of
its parent physical host. In the future this will be especially
interesting to those running IaaS and public virtualization
infrastructures. It brings another abstraction layer that improves
V.M. isolation, management and, most of all, the ability to run two
different hypervisors, one on top of the other. This implies that
nested virtualization will make it easier to migrate virtual assets,
which are mostly hypervisor dependent, while making it easier to
manage large scaled virtualization infrastructures. Just imagine
hosting hundreds of virtual machines for a few customers. While
zoning is possible, the number of V.M.s will still be overwhelming.
What if we had a single V.M. hosting a customer's whole virtual
infrastructure? What if we could port all their V.M. assets, which
could be based on a different hypervisor, onto a single, manageable
virtual entity? This could potentially improve legacy support too.
It's not all sunshine and roses of course, as more abstraction
usually brings more complex, convoluted technology - yet the
potential is surely appealing. V.M. nesting is still in its teen
years: while it can be found in most commercial (eg: VMWare) and open
source solutions (eg: KVM, Xen) it's still widely considered an
unstable feature.

The trials

Centos
6.3
Given that the standard distro in our infrastructure is
CentOS, I decided to take a first dive into CentOS 6.3. Little did I
know that bad news was in store. I found out that the kernel’s
latest version was 2.6.32 and the available KVM module is behind in
development. As it turned out the KVM module was barely capable of
enabling nested virtualization. While the vmx extension was
accessible on the level one virtual guest, which means that I was
able to successfully install and start libvirtd, the whole "virtual
stack" was stuck as soon as I tried to run the level two guest.
I tried to disable virtIO and all those features that were
non-critical (usb, sound devices, etc.), but that was useless. Since
CentOS is supposed to be the tried and true distribution and other
solutions outside the vanilla distro were not on my book (say - no
custom kernel) I decided to leave this path while going one step
forward anyway.

oVirt node 2.5.5
Our
virtualization infrastructure is oVirt based. Since oVirt-node, oVirt
distribution for host nodes, is Fedora based, the kernel level is
pretty much up to the latest versions. I went into this with some
optimism, knowing that the latest releases of the kernel could give
me a decent chance at running a nested KVM stack. Then I learned that
things weren't so easy, and I did it the hard way. The issue lies in
the way oVirt creates it's virtual guests. I couldn't find a way to
tell oVirt that it should instruct qemu to enable the vmx extension
in the emulated cpu features. Moreover, the largest slice of an
oVirt-node filesystem is non-persistent, which means that an extra
effort is required to maintain changes applied to the system. oVirt
apparently does not store a template of the virtual guest, neither in
its database nor in the filesystem (eg. libvirt stores an xml
template for its virtual guests), so I couldn't find a way to
manually enforce such a feature. I began asking around the community
for some advice, and, courtesy of a couple kind oVirt devs,
discovered that there's a patch for the VDSM (Virtual Desktop and
Server Manager) that implements a hook that lets virtual guests that
run on a physical node with KVM nesting enabled expose Intel's vmx
feature. Once again, time was running short and the effort required
crippled my chances.

Fedora 16
With just a couple
of days left I decided to go for a less fiddly approach and installed
Fedora 16 on my physical host, since it was promptly waiting on our
pxe server. Everything went pretty smooth this time. Once the host
was installed and up to date and the KVM module was told to enable
nesting I created a Centos L1 guest in a breeze. I proceeded to setup
up the nested L2 guest, paying attention to setting the disk bus as
IDE, given that I knew that VirtIO was not supposed to work in a
nested configuration. The second level guest came to life easily, but
I noticed a substantial overhead on the L1 guest during disk
operations. I shut down the L2 V.M. and changed the disk bus to
virtIO and, much to my amazement, the V.M. happily booted while the
overhead was reduced. At this stage, I was pretty curious
about nesting different hypervisors, so I set a Centos 5.8 Xen guest
up, but it refused to boot into Xen, freezing with a kernel panic
message. Next item on the list was Hyper-V - the Windows server 2012
machine correctly installed but, just as its open source counterpart,
it froze during boot sequence.

Conclusions
KVM nested
virtualization is basically working, yet not stable enough to be
brought under the spotlight. I encountered even more issues while
trying to run more than one vmx-enabled L1 guest. Yet, KVM on KVM
nesting worked properly and, given the constant development and the
minimal overhead, it really is promising. Concerning my previously
set target, I'll recompile the VDSM and try to convert my fedora
server into an oVirt node, a thing which I'll eventually attempt as
soon as I'll find more spare time. In the meantime, thanks for
reading this and feel free to share your experiences here.

Quick update:

I've succesfully deployed a nested configuration using Fedora 18 Beta. I'm having three L2 guests running inside a L1 guest and everything is looking pretty at the moment. I'm still fighting with VDSM to convert my host in an oVirt node, hope to have more good news soon.

20/06/12

Supporting Linux is important to NVIDIA, and we understand that there are people who are as passionate about Linux as an open source platform as we are passionate about delivering an awesome GPU experience.
Recently, there have been some questions raised about our lack of support for our Optimus notebook technology. When we launched our Optimus notebook technology, it was with support for Windows 7 only. The open source community rallied to work around this with support from the Bumblebee Open Source Project http://bumblebee-project.org/. And as a result, we've recently made Installer and readme changes in our R295 drivers that were designed to make interaction with Bumblebee easier.
While we understand that some people would prefer us to provide detailed documentation on all of our GPU internals, or be more active in Linux kernel community development discussions, we have made a decision to support Linux on our GPUs by leveraging NVIDIA common code, rather than the Linux common infrastructure. While this may not please everyone, it does allow us to provide the most consistent GPU experience to our customers, regardless of platform or operating system.
As a result:
1) Linux end users benefit from same-day support for new GPUs , OpenGL version and extension parity between NVIDIA Windows and NVIDIA Linux support, and OpenGL performance parity between NVIDIA Windows and NVIDIA Linux.
2) We support a wide variety of GPUs on Linux, including our latest GeForce, Quadro, and Tesla-class GPUs, for both desktop and notebook platforms. Our drivers for these platforms are updated regularly, with seven updates released so far this year for Linux alone. The latest Linux drivers can be downloaded from www.nvidia.com/object/unix.html.
3) We are a very active participant in the ARM Linux kernel. For the latest 3.4 ARM kernel – the next-gen kernel to be used on future Linux, Android, and Chrome distributions – NVIDIA ranks second in terms of total lines changed and fourth in terms of number of changesets for all employers or organizations.
At the end of the day, providing a consistent GPU experience across multiple platforms for all of our customers continues to be one of our key goals.