You should enable JavaScript to get a nice table of content for this page.

What is Ganeti?

Ganeti is a not so thin layer on the side of an hypervisor to facilitate
the management of your virtual machines. It helps you move virtual
machine instances from one node to another, create an instance with
DRBD replication on another node and do the live migration
from one to another and basically everything you can expect from a
robust platform.

Historically, Ganeti started within Google to manage the business infrastructure (print servers, LDAP, accounting, etc.). I looked at it because as it was possible to install it from source and without a patched kernel on Debian Squeeze. The quality of the code, documentation, development process and discussions on the news group finished to convince me to try it. Here is the way I have setup the system with KVM.

This document must be read together with the Ganeti installation documentation. Some of the steps described in the official Ganeti installation documentation are not described here.

For reference, the software versions are:

Debian Squeeze for the nodes and the guests.

Ganeti 2.4.2 installed from source.

Terminology

node, server: The non virtualized server.

virtual machine, VM, instance: The virtualized operating system.

Everything is running Debian Squeeze in 64bit, so many terms are taken from the Debian way to name things.

Network Topology

Your cluster will run on your network, it means that this configuration will need to be adapted to fit your requirements. In this case, the servers are hosted with OVH and have:

A single physical network interface eth0 with a fixed public IP address.

A tagged interface VLANeth0.2186. On this interface, two networks are available a private network 192.168.0.0/16 and a public network 178.33.145.128/26. The public network is a RIPE block.

The goal is to have for each VM:

A private IP address on the private network.

A public IP address on the RIPE block.

The ability to connect to all the other VMs on each network.

The private IP address is used for the infrastructure and the public IP for outside communication. The VLAN is working accross the 3 datacenters of OVH.

Base System Setup

Each server has at 12GB+ RAM and two harddrives (750GB or 1.5TB). They
are all basically the same. It is important to have an homogeneous
park of servers to have better predictability in the performances. It
is also very important to setup them the same way. These notes are
very manual, scripting things with fabric is recommended.

The base setup and Ganeti must be performed on all the nodes. As each
node can become master, you need the software on each node.

The partitions are pretty simple. The base OS is on 25GB software
RAID1 partition and each drive get a 12GB swap partition for a total
of 36GB virtual memory.

The rest of the drives is used as a big LVM xenvg volume group — at origin Ganeti was only supporting Xen, this is why the default names often use xen. On this node there is 2.6 TiB of raw available storage for the VMs. No RAID is used, that is, if you create a non DRBD replicated VM, you have a single point of failure. See the replication and backup strategies below.

After the storage setup, the network needs to be setup too. Ganeti supports both routed and bridged networking, here bridged is used.

You need to be sure to have the right packages to support the bridge and VLAN.

apt-get install vlan netcat fping tcpdump netmask bridge-utils

The setup is pretty simple, each node gets the dedicated address assigned by the provider on eth0 and a private IP address on eth0.2186 (replace 2186 with your own VLAN or maybe your own NIC, for example eth1).

DRBD Configuration

Follow the recommendations from the official
documentation. Especially, be sure to have the usermode helper being
just /bin/true, that is, in your /etc/modules file you have a line
with:

drbd minor_count=128 usermode_helper=/bin/true

Installing the Operating System Support Packages

To be able to install instances you need to have an Operating System
installation script. You need the scripts on all the nodes (maybe
using Puppet to manage them) as the creation of an instance is done
directly on the target node.

The easiest way to go was to use the ganeti-instance-image package and the ganeti-instance-debootstrap package. These packages are only loosely connected to the Ganeti release number, so at the moment, it is possible to directly install them from the provided debian packages.

The symbolic links are needed as Ganeti is looking at the OS
definitions in /srv/ganeti/os.

Repeat all this setup for each node. Now, you know why automation
is needed. For example, use 192.168.0.2 as the IP of the secondary
node.

Startup of the Cluster and the First Secondary Node

It is extremely simple, first define the IP address of your cluster. In my case I selected 192.168.1.1 with the name clust1.ceondo.net. So, in the /etc/hosts of each node, I add:

192.168.1.1 clust1.ceondo.net

Then on the first node run:

gnt-cluster init clust1.ceondo.net

Ok, so what do you have now?

A single node in a Ganeti cluster with IP 192.168.0.1.

The single node has the primary (master) role at the cluster level, as it is master, the IP of the cluster 192.168.1.1 is added to the xen-br0 bridge. This is done automatically by Ganeti, you must not do it yourself. This IP must be on the same subnet as your bridge, because only the ip is added and it reuses the network information of the bridge.

The next step is of course to add another node to the cluster. If you
do not have a DNS server for your private network, simply add the node
IP to your hosts file. For example:

192.168.0.2 node2.ceondo.net

Now on the master, run:

gnt-node add node2.ceondo.net

Doing so will add the node to the cluster and update the ssh
configuration the node to ensure communication between the nodes. So,
now, you can get information about your nodes — here 3 instances are
running on these nodes:

If you have setup a third node, you can add it too... but now, you are
maybe more interested by creating your first instance.

Instance Creation from an Installation Image

The first simplest way to create an instance is to simply boot an
installation CD with KVM and VNC. All the operations are run on the master. If you want to create the instance on the secondary node, you need to download the iso file on the secondary node.

-s 10g: it will have a single disk of 10GB (the partitions in the disk are up to you).

-o image+default: it will use the image os with de default variant.

-n node1.ceondo.net: it will be created on node1.

--no-start --no-install: after the addition, we do not start it and we do not run the image+default os installation scripts.

-H kvm:vnc_bind_address=127.0.0.1: we inform the hypervisor that we want VNC binded on localhost. You can put 0.0.0.0 to bind on all the interfaces if you do not want to use a ssh tunnel, but this is not really secure and the default Gnome VNC viewer — Remote Desktop Viewer — supports ssh tunneling very easily.

vm116.ceondo.net: this is the name of the instance. The name must resolve. If you do not have a DNS server, put it in the node hosts file.

If you run this command, it basically just adds the instance to the cluster on node1. Now, it is time to boot and install the instance. First, we need to be sure that KVM will but with the kernel from the CD and we do not want serial console.

When starting with the -H option, it means that for this boot and
this boot only, KVM will uses these parameters. It also mean that if
you restart the instance, it will not have the cdrom — which is what
we want.

After you run this command, run:

gnt-instance info

at the top, you will have the information on the VNC IP and port. So
just connect your VNC client. For example to 127.0.0.1:11001 and use
the host, in my case provided by OVH, root@ns12345.ovh.net as SSH
tunnel. You can now start the installation.

To be able to clone and reuse this instance as template for new
instances, the partitions can only be ext3/ext4 or swap and the order
of the disks in the partition table must be either:

/dev/$disk1 /boot/dev/$disk2 swap/dev/$disk3 /

or

/dev/$disk1 /boot/dev/$disk2 /

I prefer to run without swap and a possible careful over commit of the memory at the node level. RedHat provides some good background information about it. The /boot partition is needed because the kernel used is not the kernel from the node.

Run the installation as usual, you will have to define the network
connection manually, in my case, this means providing the RIPE block
netmask and gateway information:

IP of the VM: 178.33.145.152

Netmask: 255.255.255.192

Gateway: 178.33.145.190

As it is easy, I first set the DNS server to the google one: 8.8.8.8. A private dns server is available on the private network, but the CD installation does not offer the ability to have two IP address directly.

So, everything is fine, you can finish the installation (do not forget to install SSH!) and then restart the instance without VNC:

then, from your personal computer, you should be able to ssh into your node:

$ ssh yourlogin@vm116.ceondo.net

Customize, clean, make this instance a base for mass deployment. It
will be the template used by the image os definition. The image
template will take care of changing the IP/hostname etc. for you and
even the RAM and disk size.

Instance Creation with ganeti-instance-image

So, everything is nice under the Sun, you have your instance running,
but now you want to start a new instance. Better not to have to go
through the CD install each time. The image OS definition is doing
just that. You can create a template out of a running instance and
reuse it to deploy as many times as you need.

First, shutdown the instance to have the disks in a consistent state for the dump:

gnt-instance shutdown vm116.ceondo.net

Now, we create the default image OS definition. When creating a new instance, it means we will pass the -o image+default option. You can create many variants, but pay attention if you have too many of them, it will fast be a nightmare to manage them. So, our default will be:

You can either put them in /etc/default/ganeti-instance-image as I
do — this makes sane defaults for all the variants — or directly for
the default variant definition in
/etc/ganeti/instance-image/variants/default.conf. After you update
the file, do not forget to sync it on all the cluster nodes. Again,
Ganeti as some tools to do it:

gnt-cluster copyfile /etc/default/ganeti-instance-image

or

gnt-cluster copyfile /etc/ganeti/instance-image/variants/default.conf

This is now time to make the dump of the first instance to reuse it as
template.

Now, you have the files debian-6.0-x86_64-root.dump and debian-6.0-x86_64-boot.dump in your /srv/ganeti/instance-image folder. You need to sync this folder on all your nodes to create an instance from this template. You can also have a small NFS share, mount it as /srv/ganeti/instance-image and that way you do not have to sync. This is up to you. My provider OVH has some managed NAS which fit perfectly this requirement.

Time to create a new instance based on this template. As you can
expect it, it will be for vm117.ceondo.net. As we do not want it to
have the same IP address, we need to define the customization of the
instance in the OS installation scripts. To do that, you need to
define the network of your instance and its IP address. As you will
reuse the network information many times, it receives its own
definition:

Done, the new instance is available and you can start to play with it. What you can notice is that instead of a 15GB disk, you can change to use a different size. You can also change the RAM size. Even better, you can use DRBD instead of plain LVM volume, just pass -t drbd as disk template.

Now, if you haven't done it yet, do not forget to add the init and crontab files of Ganeti.

Private and Public Networks

If you are using EC2, you are used to get two network
interfaces for each instance, one with a private address and one with
a public address. Ganeti is extremely flexible and allows you to
startup an instance with two network interface or add a new network
interface to an instance:

This is adding a new NIC nic.1 with a new random MAC address. The default parameters come from the cluster wide parameters. So, if your hardware node has two bridges one on the public network xen-br0 and one the private network xen-br1, you would add a NIC on the private network by running:

The new NIC is not using the cluster wide default but the specified
bridge. This provides a lot of flexibility in managing your instance
networking. As this is bridged networking, you have to do the
traditional network configuration at the instance level.

To create right from the start an instance with two network cards
based on an image, you could run:

What About Security?

After setting a new system, running nmap is a good idea. You will figure out that the remote api binds on all the interfaces of your master node. This is not so good. This can be changed. As the cluster IP in this case is on the private network, this can be use. 127.0.0.1 is also an option:

Do not forget to have it on all your nodes. Pay attention that the
remote API daemon is binding on the cluster IP and the noded, confd
daemons on the IP of the node.

High Availability

Ganeti does not provide HA. It is like Amazon EC2, you can create an instance, perform backup, restore and better than EC2 you can move one to another node without downtime, but the automatic failover system is not provided.

The only provided automation is the watcher running from the
cron. If an instance is down in error state, it will try to start
it. Nothing more but nothing prevents you to build HA on top of Ganeti or to have HA at your application level and not at the instance level (this is what I prefer).

Replication and Backup Strategies

Replication

For real time replication you can use DRBD, just create your instance with the -t drbd template and Ganeti will take care of all the DRBD details. Please remember that replication is not backup. If you replicate corrupted data, you have nothing left, if you drop your database in your replicated instance, you have nothing left.

Again Replication is not Backup. This is why Google
still use tapes to perform backup! Céondo's approach, which
is not necessarily the best, but which fits the way our software is
designed is:

Backup

Once you have replication, you can do backup. If your replication is well designed, you can stop the replication the time to perform a backup.

Ganeti provide an easy way to backup a stopped instance and restore it:

gnt-backupexport<instance>gnt-backupimport<instance>

this can be a convenient way to increase the disk size of an instance as you can change the disk size at import time. The problem is of course that you need your instance to be down. To limit downtime, you can do a LVM snapshot and/or try to limit the size of your instances.

The backup destination can be on a NAS in another data center to do
point in time recovery. Once you push a backup file on the NAS, chmod
it as 0444 to prevent accidents.

Oh, backups are of no use if you do not test them. This is hard,
it means that you need a special environment to restore and test
without affecting your production system.

Performance Tuning

If you do not require very specific CPU features, you can pass to the -cpu host flag to KVM.

If you do not need it, you should disable VNC. In our case, it was
eating 6% of a CPU all the time.

gnt-instance modify -H vnc_bind_address= <instance>

Solving Problems

Ganeti is very nice, not only because it works well, but also because when things are not going well, a lot of diagnostic tools are available to figure out what is going on. The first thing to do is checking your instance configuration: