Chapter 1. Introduction

Rackspace has developed Rackspace Private Cloud Software, a fast, free, and easy way to download and install a Rackspace Private Cloud powered by OpenStack in any data center. Rackspace Private Cloud Software is suitable for anyone who wants to install a stable, tested, and supportable OpenStack private cloud, and can be used for all scenarios from initial evaluations to production deployments.

Chapter 2. Backup and Recovery with backup-manager

After you have installed Rackspace Private Cloud Software, there are no backups configured for your cluster. If you want to back up your cluster, this document provides guidance on performing a backup.

Note that this document does not attempt to cover broader problems of instance failure and recovery, and it does not address backup policy questions, such as appropriate retention policies and rotations.

The following directories contain crucial components for backup:

/var/lib/glance/images

/var/lib/chef/backups

/etc/mysql

You must also back up the following MySQL databases:

mysql

nova

glance

dash

keystone

Ubuntu's backup-manager

Ubuntu includes a simple tool called backup-manager. Any backup tool compatible with Ubuntu 12.04 can be used, but backup-manager is relatively simple to use for users already familiar with UNIX and bash scripts.

The following procedure will configure backup-manager to use the script that you will create to back up the controller node.

Log into the controller node and switch to root access with sudo -i. You will need root access for all of the procedures in this chapter.

Install and launch backup-manager:

$ apt-get install -y backup-manager

You will be prompted to provide a directory in which to store the backup-manager archives. You may accept the default of /var/archives.

Choose root as the owner user of the repository.

Choose root as the owner group of the repository.

Designate the following directories for backup:

/var/lib/chef/backups

/var/lib/glance/images

/etc/mysql

In the /etc/backup-manager.conffile, edit the following variables to match these settings:

You can now configure the backup-manager configuration file to suit your retention policy and upload requirements. For example, you can create a simple data redundancy plan by uploading the backup to a secondary server that is accessible via SSH. Refer to the backup-manager documentation for more information about configuring the backup.

By default, backup-manager automatically executes nightly. You can also generate a backup manually.

Note: Because all image files are backed up, these backups can be quite large. Ensure that you have enough space.

Backing Up the Controller Node

In a Rackspace Private Cloud installation, the controller node houses all the configuration information for the cluster, all OpenStack databases, and all images. To back up the configuration data, follow this procedure.

Create a script file in /usr/local/bin/chef-backup.shand include the following:

This script will place the configuration data for your cluster in the directory that you specify in the BACKUP_DIR environment variable. By default, it will choose /var/lib/chef/backups.

Ensure that this script has executed by checking the backup directory.

If it has done so successfully, you may now run backup-manager to back up your controller node to your desired backup destination or wait for backup-manager to execute automatically.

You can also set up a cron job to schedule backup-manager. Run the command crontab -u root -e and enter a cron job specifier. The following sample specifier would run backup-manager every night at midnight.

@midnight /usr/bin/backup-manager

Backing Up the Compute Node

The only information unique to the compute nodes is the disks for running instances. The instances can be backed up with the OpenStack snapshot tools in the Horizon dashboard and in the nova-compute API. You can also install backup tools inside the instances themselves.

Recovery

For the recovery process, you will reinstall the components with the Rackspace Private Cloud Software ISO. Before you begin, ensure that you have the correct networking information:

The IP addresses that you want to assign to each controller and compute node. This can be an IPv4 address in the format xxx.xxx.xxx.xxx or a CIDR range, and it must be able to access the internet.

Network subnet mask.

Network default gateway. This address is usually a xxx.xxx.xxx.1 address.

The server host name. You may be able to define this yourself, or you may need to contact your network administrator for the name.

Fully-qualified domain name for the host.

The address for the nova fixed network, in CIDR format. Instances created in the OpenStack cluster will have IP addresses in this range.

Optional DMZ network address. This address is also in CIDR format. Specifying a DMZ enables network traffic between instances and resources outside of the nova fixed network without network address translation. For example, if the nova fixed network is 10.1.0.0/16 and you specify a DMZ of 10.2.0.0/16, any devices or hosts in that range will be able to communicate with the instances on the nova fixed network.

A password for an admin OpenStack user.

A password for a non-admin OpenStack user, as well as a username if you do not want to use the default of demo.

A full real name, username, and password for an operating system user.

Recovering the Controller Node

Use the ISO to re-install the controller node.

Log into the controller node and switch to root access with sudo -i.

Restore all backed up files to their appropriate locations.

Use the following script to restore the chef server contents after /var/lib/chef/backupsis restored:

Restart MySQL. Do NOT run chef-client before MySQL has been restarted. Doing so could cause data loss.

Run chef-client on the controller node.

Delete the client certificate on all compute nodes (located in /etc/chef/client.pem).

Rerun chef-client on all compute nodes.

Recovering the Compute Node

Before restoring compute nodes, you must remove existing compute node data from the controller node.

Log into the controller node and switch to root access with sudo -i.

Execute the following command to remove compute node data:

$ knife client delete name_of_compute_node

Use the ISO to re-install the compute node.

You can now re-create the instances. Note that when a compute node fails, all instance data is lost, so you must the instance data from configuration management, other backup recovery methods, or deployment of snapshots. IP addresses on instances will not be stable, so some reconfiguration may be necessary.

Chapter 3. Advanced Controller Node Backup and Recovery

Rackspace Private Cloud Software installs all "control plane" services on the controller node, including and not limited to:

All API services

The MySQL database that OpenStack uses to maintain information about the state of the clusters

All images uploaded to Glance

You can reduce recovery times and potential data loss windows by configuring a standby server for the controller node. This does not constitute a true "high-availability" solution, but increases application layer resilience.

In this chapter, the main controller node will be referred to as the "active node", and the backup controller node as the "standby node". The configuration process includes the following stages:

In the event of failure, follow the steps in the Failover procedure to switch to the standby node.

Configure the Standby Node

Install Rackspace Private Cloud software on the device that you want to use as a standby node.

Boot the ISO on the standby node.

After the ISO has launched and loaded, accept the EULA statement.

Select Controller.

Enter the NIC address. If you have more than one, you must designate one as public and one as private.

When prompted, enter the node IP address, subnet mask, gateway, name server, and host name. Use the same host name as that of the active controller node.

Enter the address for the nova fixed network.

If you want to configure a DMZ network, enter the DMZ address and the DMZ gateway address. Be sure that you have at least two NICs on the server.

Enter a password for the admin user. You will use this admin username and password to access the API and the dashboard.

For the additional non-admin user, accept the default demo or enter your own and provide a password at the prompt. This user will not have admin privileges, but will be able to perform basic OpenStack functions, such as creating instances from images. Creating the user will also automatically create a project (also known as a tentant) for this user.

Enter the real name, user name, and password for the operating system user account. For example, the user Jane Doe would enter the following information:

Full name for the new user: Jane Doe

Username for your account: jdoe

Password: mysecurepassword

At this point, it will take approximately 5-10 minutes for the Ubuntu operating system installation to complete.

If you have a proxy, enter the proxy URL at the prompt in the format http://proxy_ip_address:proxy_ip_port. If you do not have a proxy, press enter to skip this step and leave the proxy information blank.

At this point, the installation process will run for approximately 30 minutes without the need for user intervention. The device will reboot during the installation process. You will see a screen with the Rackspace Private Cloud logo, followed by a screen that displays a progress bar; you can use Ctrl+Alt+F2 to toggle between the progress bar screen and a Linux TTY screen (Ctrl+Alt+Fn+F2 on a Mac). You can follow the log during installation by switching to the correct TTY screen and viewing the log in /var/log/post-install.log.

After the installation is complete, you can view the install log by logging into the operating system with the username and password that you configured in Step 9. The log is stored in /var/log/post-install.log.

Capturing Chef State

Whenever a node is added to your cloud, you should back up the configuration data.

Create a script file in /usr/local/bin/chef-backup.sh and include the following:

This script will place the configuration data for your cluster in the directory that you specify in the BACKUP_DIR environment variable. By default, it will choose /var/lib/chef/backups. Ensure that this script has executed. If it has done so successfully, you may now run backup-manager to back up your controller node to your desired backup destination.

Synchronizing the Image State

The most robust mechanism for ensuring the security of your images is to configure an OpenStack Storage (Swift) cluster and configure Glance to store images in the cluster. If this is not a viable option, you can use rsync to save the image to your standby node.

This section describes the rsync method in detail. For information about using OpenStack Storage, refer to Rackspace Private Cloud Software OpenStack Storage Installation.

Connect to the active node via ssh and use sudo -i to switch to root access.

Verify that rsync is installed. If it is not, install it with apt-get install rsync.

Use cat to obtain the contents of the public key.

$ cat /root/.ssh/id_rsa.pub

The command will output the public key in a string similar to the following:

This output shows that the synchronization between the active node and the standby node is successful. You will now need to create a cron job to automate the synchronization. Because rsync only copies new or changed images, the initial transfer may be slow. However, subsequent synchronization runs will run more quickly.

In the following procedure, rsync is configured to copy chef backup information in addition to the Glance images, and to run every five minutes.

On the standby node, create a chef backups directory.

$ mkdir -p /var/lib/chef/backups

On the active node, create a script file in /usr/local/bin/rsync_job.sh and include the following, replacing standby_node_IPwith the standby node's IP address.

On the active node, edit the access permissions to ensure that the script is executable.

$ chmod +rx /usr/local/bin/rsync_job.sh

On the active node, set up the cron job with the command crontab -u root -e. Enter the following cron job specifier to make the rsync job run every five minutes:

*/5 * * * * /usr/local/bin/rsync_job.sh

In this configuration, images deleted on the active node are not deleted on the standby node. This means that you can retrieve an image from the standby node if it is accidentally deleted from the active node, but if you are frequently adding and deleting images, the undeleted images can take up a lot of disk space. To automatically remove images from the standby node when they are deleted from the active node, add a --delete flag to the rsync images command in the script file:

On the standby node, create a configuration file at /etc/mysql/conf.d/replication.cnfand include the following content:

[mysqld]
log-bin=mysql-bin
server-id=2

Run restart mysql on the standby node. Leave the ssh session to the standby node open.

Create two new ssh sessions to the active node.

In the first ssh session on the active node, run mysql. At the prompt, enter the command FLUSH TABLES WITH READ LOCK;. Leave the session open on the mysqlprompt.

mysql> FLUSH TABLES WITH READ LOCK;
mysql>

In the second ssh session on the active node, run the following command:

$ mysqldump --all-databases --master-data >dbdump.db

When this command is completed, switch back to the first ssh session and enter UNLOCK TABLES; at the mysqlprompt. Exit mysql.

mysql> UNLOCK TABLES;
mysql> exit

Return to the second ssh session and issue the following command to transfer the dbdump.db file to the standby node, replacing standby_node_IPwith the standby node's IP address.

$ scp dbdump.db root@$standby_node_IP:

On the standby node, run the following commands.

$ grep 'CHANGE MASTER TO MASTER_LOG' dbdump.db

This command will return a statement that includes the filename of the master log file and a master log position.

Run the following set of commands to initiate the replication process. The standby_password is the one used for the user repl in step 5, and the grep_log_file and grep_positionvariables are the filename and position returned by the grep command in step 13.

Assign the IP address of the formerly active node to the standby node. At this point, all services should begin working.

Open the /etc/hosts file in a text editor and comment out the line that binds the hostname to the standby node IP address. For example, in a configuration where the standby node's IP address is 172.16.137.10 and the hostname is standby.myhost.com, the line would look like this:

# 172.16.137.10 standby.myhost.com

Run chef-client on the node.

Delete the client.pem file from all the nodes in the cluster and run chef-client on all compute nodes.