Mac, Linux and things in between

RAID is not backup

One thing that a number of people overlook is that RAID is NOT a backup solution. That’s right, say it with me, “RAID is not a backup solution.” RAID is at most a data availability solution and nothing more. With that in mind it is always good to have a backup. This post will concentrate on creating a backup script and routine for my Linux server as well as my Mac. Since the Mac is UNIX based I can back it up in much the same way as I do my Linux server.

Please keep in mind that this method assumes a lot of things and should not be considered for a business environment where security is more important. My method involves creating a public/private ssh key without a passphrase and worse, the root password for my MySQL server is also coded into this script. Since this is all DIY, you can expand whatever you do to be a bit more secure.

From one of my earlier posts you’ll note that I recently lost a RAID array and lost everything on it. This is because of the type of ARRAY I used was built for speed and not data redundancy. Since I knew the risks of using that type of RAID I had a backup of my most important files. I’ve since replaced that RAID set and despite using a mirrored set, where if one drive fails the other one will continue, I am still making sure I have a backup of all my important files. The reason for is that while RAID that provides redundancy it doesn’t provide anyway to restore files that are mistakenly deleted or if the RAID set dies in such a way that it takes the data with it. While it is unlikely to happen, it’s still better to be prepared.

There are a number of backup solutions available but the best one is the one that you use AND the one that works automatically. I say this because manual backup procedures tend to be forgotten about at the worst times. While there are a number of existing backup solutions available for Linux and Mac I always like to roll my own using tools like rsync or mysqldump. This allows me to quickly access backed up data in the event I have to restore something.

In my Linux server I have a 500GB mirrored drive set plus another 200GB drive. The 200GB drive is where I do my backups too. This works for me because the 500GB RAID set contains a lot of non-essential data that can be recreated, such as DVD rips, downloaded programs and video that is being edited so I don’t bother to back up everything.

In previous posts I’ve made it apparent that I’m a command line guy and my backup solution is no different. My backups rely on a simple rsync command repeated a couple of times for each area I want to backup. In order to do this, I’ve created a simplistic bash script. That’s part of the beauty of a DIY solution, it can be as simple or as elaborate as you want, though I’d argue that once your DIY solution gets too elaborate it’s best to use a pre-built solution. I also have a Mac in the house and I’d like to be back that up at the same time. Through the use of wake-on-lan and ssh I can accomplish this as well.

There are a few places where I replaced some sensitive data with ***. You can see from this script that I’m using rsync to backup some data from my home partition as well as all of my html files from /var/www/html. I also create backup files for all of the databases in MySQL as well as dump my subversion repository.

BASH
Before I can really talk about how the script works, I should back up and explain a little bit about how bash scripts, or any script, run at all on a Linux or Unix system. When you’re at the command prompt of most Linux systems, you are usually brought into what is known bash. Bash is actually a process running on the system that simply calls other processes. Sure it has a few built in features that I take advantage of but for the most part it’s job is it load a program and allow it to run.

Bash can load either a binary file and execute it, or it can take a file that has a series of commands on it. Such a file is called a script and it becomes a bash script by placing the special “sha-bang” at the beginning and then defining what interpretor to use. #! is the sha-bang and /bin/bash is the interpretor to use. You can learn more about bash by clicking on over to The Linux Documentation Project‘s article on advanced bash-scripting.

RSYNC
Rsync is a program that is used to synchronize two directories, either on the same machine or over the network. In my script I end up employing both methods to backup files on my Linux machine as well as files from my Mac system.

In my script I use one of the most basic of rsync commands with a simple set of options, ‘rsync -av’. This is because the designers of rsync know what most people are probably trying to achieve, a simple synchronization of the files between two locations. The -a actually tells rsync to do a number of things, which are all defined in rsync’s man page. To access the man page for rsync, type ‘man rsync’.

The next option -v (-av is the same as saying -a -v) tells rsync to be more verbose about what it is doing. Since I will later using cron to run my backup the extra output will make for a nice email synopsis of how the backup went.

The next options deal with what you want to copy and then to where you want to copy the files. When the exact same command is run in the future, the destination is simply updated with the files that have changed. At this point you may be realizing that my backup solution doesn’t allow for daily incrementals or the ability to access a changed file from a backup that occurred more than one backup ago. If this is a concern for you, you may want to modify how your script works or consider a different command line tool such as rdiff-backup.

RSYNC OVER A NETWORK
To me the more interesting aspect of rsync is it’s ability to either synchronize files from a remote system, or too a remote system. While rsync does provide it’s own method for talking to other machines by running an rsync process as a daemon, or, in the background, I prefer to use ssh as a means to connect to the remote host.

In my script you’ll see the line:

rsync -av -e 'ssh -l ruedu' ruedu@192.168.1.99: /backup/Mac/ruedu

This code tells rsync to communicate to the remote host (192.168.1.99) using ssh. Since ssh is an interactive thing and I plan to ultimately use my backup script as part of an automated nightly task I need to ensure that ssh isn't going to ask for a password. To accomplish this I used an RSA key for authentication.

SSH key based authentication is generally used so that you as the user don't have to type in a password for each system you might connect to. In it's correct usage, you generate a key using ssh-keygen, apply a passphrase to the key and then copy the public key file to any remote hosts you wish to log in to. To use the key, you use ssh-agent. SSH-agent loads your private key, asks for the passphrase and then sits in the background ready to provide authentication for future ssh sessions.

This of course doesn't solve my problem so instead I use a public/private key set without a passphrase at all. This allows me to connect to a remote machine without any sort of interaction. It should be noted that you should ONLY do this on a small trusted network. I consider my home network to be both small and trusted.

To setup a passphrase free key simply type ssh-keygen and press enter. Some tutorials will tell you to use a DSA key rather than the default RSA but considering that I'm about to create a key without a passphrase, the quality of the key is rather moot. Here is what will happen:

[root@drue #]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
19:5e:86:2a:61:01:11:c0:c1:75:c4:7f:e7:a1:b6:c5 root@drue
[root@drue #]$

This command will create two new files in your .ssh directory. This directory is located in your home directory at $HOME (cd + enter will take you too your home directory). The id_rsa.pub is the file that you copy to another host. Never give out your id_rsa file as this would allow anyone else to log into whatever remote computer as a copy of the id_rsa.pub file.

In my scenario, I want to copy the new id_rsa.pub file to my mac. Here is the command I use to do so.

scp .ssh/id_rsa.pub ruedu@192.168.1.99:

SCP is a command that comes with the ssh package and stands for secure copy. This command will require your usual SSH password to finish. Once the file has been copied, ssh again to the remote system and do the following:

cat id_rsa.pub >> .ssh/authorized_keys

This will copy the contents of your id_rsa.pub file into the .ssh/authorized_keys file. This file tells the remote system whether or not the key the local system is offering is valid or not. In this case, when I connect to my Mac using ssh from my Linux server, my Linux server will offer the id_rsa file from my home directory. Since the Mac contains a copy of the associated key pair in it's .ssh/authorized_keys file, the connection is allowed to happen. Here is what connecting to the Mac looks like when using the key based authentication.

In the above, you see that there is no request for a password, it is simply able to connect because of the private/public key we created earlier.

MYSQL BACKUP
Near the bottom of the script is a very funky looking section. This section is where I employ some of the more programming like features of Bash. Again, you can read more about these features by heading over the the Linux Documentation Project.

What I use is called a FOR loop. It loops over a set or range of items and performs the same action over and over until that set of items has been gone through.

This one line command actually strings together three different programs and utilizes pipes to do so. Pipes are a beyond the scope of this article, but understand they simply take the output from the program that appears on the left side of the pipe to the command on the right side. The end result of the command above is a space separated listing of all the databases in MySQL. I use this command because the databases available in MySQL could change and I didn't want to modify my backup script each time I created a new database. I could have done the following instead.

for I in database1 database2 database3

In this case the effect is the same, the for loop will perform the code between do and done for each word (or number) that appears after the keyword 'in'. Each time through the code the letter 'I' will be replaced with whatever value is next in line. The first time through the value if $I will be database1. Notice that when a variable is being set in Bash, there is no $ sign, but when you wish to use it (expand it) you place $ in front of the variable name.

The meat of the for loop is the code inside, in my example that code is:

mysqldump $I -p*** > /backup/mysql/$I.sql

The first time through the loop, the $I variable will be replaced with database1 (based on my simplified for loop) so the command will look like:

mysqldump database -p*** > /backup/mysql/database1.sql

This line will be repeated for each item that appears after the keyword 'in'.

Ok, the last line in my backup script is pretty specific to me and I doubt most people would need this. I'll discusses for completeness anyway.

svnadmin dump /var/www/svn > /backup/svn.bak

This command simply dumps my subversion repository to the backup drive. You can learn more about subversion at http://subversion.tigris.org/. Subversion is a replacement for CVS, which is a version control system.

In a nutshell, here is what this section of the script is performing. The first step is to get a listing of all the available databases in MySQL. Despite how it may look, the very first thing that happens