How to Migrate an Instance with Zero Downtime: OpenStack Live Migration with KVM Hypervisor and NFS Shared Storage

Editor’s note: We will be talking briefly about live migration in the What’s New in OpenStack Havana webcast next week, but Damian had such a great explanation of how to actually do it that we wanted to put it out here so you can see it in action.

Live migration is the movement of a live instance from one compute node to another. A hugely sought-after feature by cloud administrators, it’s used primarily to achieve zero downtime during cloud maintenance and can also be a useful feature to achieve performance as live instances can be moved from a heavily loaded compute node to a less loaded compute node.

Planning for live migration has to be done at the initial stage of planning and designing an OpenStack deployment. Some things to take into consideration are as follows:

At the moment, not all hypervisors support live migration in OpenStack; therefore, it’s best to check HypervisorSupportMatrix to see if your hypervisor supports live migration. KVM, QEMU, XenServer/XCP, and HyperV are some of the currently supported hypervisors.

In a typical Openstack deployment, every compute node manages its instances locally in a dedicated directory (for example, /var/lib/nova/instances/) but for live migration, this folder has to be in a centralized location and shared across all the compute nodes. Hence, a shared file system or block storage is an important requirement for enabling live migration. For shared storage, a distributed file system such as GlusterFS, NFS needs to be properly configured and running before live migration can be performed. SAN storage protocols such as Fibre Channel (FC) and iSCSI can also be used for shared storage.

For file permissions when accessing the centralized storage in the shared storage, you must ensure that the UID and GID of Compute (nova) user is the same on the controller node and on all of the compute nodes (the assumption here is that the shared storage is on the controller node). Also, the UID and GID of libvirt-qemu must be the same on all compute nodes.

It’s important to specify vncserver_listen=0.0.0.0 so that vnc server can accept connections from all of the compute nodes regardless of where the instances are running. If this is not set, accessing the migrated instances through vnc could be an issue because the destination compute node’s ip address does not match that of the source compute node.

The following instructions enable live migration on an OpenStack multinode deployment using KVM hypervisor running Ubuntu 12.04 LTS with an NFS shared storage. This tutorial assumes that a working multimode deployment has already been configured using such a deployment tool as Mirantis Fuel. The lab used for this tutorial consists of a cloud controller node, a network node utilizing neutron networking, and two compute nodes.

Please note that this tutorial does not consider the security aspects of live migration. You have to research this area of concern and so do not take this tutorial as production ready from a security standpoint.

This tutorial is presented in two steps: first, the NFS shared storage implementation procedures, and, then, a demo of live migration.

Part 1: Implementing NFS shared storage

The cloud controller node is the NFS server. The aim is to share /var/lib/nova/instances across all of the compute nodes in your Openstack cluster. This directory contains libvirt KVM file-based disk images for the instances hosted on that compute node. If you are not running your cloud in a shared storage environment, this directory will be unique across all compute nodes. Note that if you already have instances running in your cloud before configuring live migrations, you need to take precautions that the existing instances are not overridden.

On the NFS server/controller node, take the following steps:

Install the NFS server.

root@vmcon-mn:~# apt-get install nfs-kernel-server

IDMAPD provides functionality to the NFSv4 kernel client and server, by translating user and group IDs to names, and vice versa. Edit /etc/default/nfs-kernel-server and set the indicated option to yes. This file must be the same on both the client and NFS server.

Ensure that the last line above is as indicated. This line indicates that the /var/lib/nova/instances is correctly exported from NFS server. If this line is missing, your NFS share may not be working properly and you need to fix it before you proceed.

Update the libvirt configurations. Modify /etc/libvirt/libvirtd.conf. To see all of the available options, please see libvirtd configurations.

Restart libvirt. After executing the command, ensure that libvirt is successfully restarted.

$ stop libvirt-bin && start libvirt-bin
$ ps -ef | grep libvirt

Miscellaneous configurations

You may skip the steps below if live migration was designed from start and hence the basic requirements are in place as stated in the introduction. These steps are to ensure that the nova UID and GID are the same on the controller node and on all the compute nodes. Also, the libvirt-qemu UID and GID must be the same on all compute nodes. This involves manually changing the GIDs and UIDs to ensure that they’re uniform on the compute and controller nodes.

The steps are as follows:

On the controller node, check the nova id and then implement the same on all of the compute nodes:

Repeat the same for libvirt-qemu but keep in mind that the controller node does not have this user because the controller node does not run a hypervisor. Ensure that all of the compute nodes have the same UID and GID for user libvirt-qemu.

Since we have changed the UIDs and GIDs of user nova and libvirt-qemu, we need to ensure that this is reflected across all of the files owned by these users. We achieve this by through the next step.

Stop the nova-api and libvirt-bin services on the compute node. Change all of the files owned by nova and nova group to the new UID and GID, respectively. For example:

Conclusion

Live migration is an indispensable feature to achieve zero downtime during OpenStack cloud maintenance where some compute nodes need to be shut down. The above steps–implementing shared storage and migrating a live instance–were followed to get a working live migration on an OpenStack Grizzly cloud running Ubuntu 12.04, using NFS shared storage.

Hello Damian,
if I have an NFS server outside of the controller (as per your setup) would I need to create a nova user in the storage node as well where NFS server is running and adjust the UID and GID to match the ones on the compute nodes?
Can you elaborate the user configuration requirement for the controller is it is not housing the NFS server?

while Setting up NFS, On compute Node (client node), Step 3 is giving error.
while mounting it is giving error like “mount.nfs: access denied by server while mounting 10.100.64.24:/var/lib/nova/instances”