Background

There are two types o' data chunks in a Drupal installation: (1) files, and (2) the database.

Files

A typical Drupal site has thousands of files. Let's see how they are arranged, starting at the directory that contains the Drupal root. We'll call this the "project directory."

<Note>

The project directory is not the one that contains index.php, install.php, the sites directory, and such. That's the Drupal root. The project contains the Drupal root. The project directory contains the directory that contains index.php, install.php, the sites directory, and such.

Och aye, I have it noo, laddie.

</Note>

The project directory contains not only the Drupal root, but also other files that are part of the project, like the private file system, shell scripts, and other things.

The project directory is somewhere in the *nix file tree. Maybe it's at /home/abandonhope/, or /var/www/vhosts/. For our purposes, it doesn't matter where the project directory is.

Restore files in the main Drupal tree (under public_html above, that Drupal rooty thing), as well as files in the directory used by the private file system (drupal_private above).

Wise Bison says:

There are more files in a Drupal project, Horatio, than are in the Drupal root.

Database

A MySQL database is a set of files as well. However, you don't need to mess with them directly. Command line tools like mysqldump simplify database backup and restore.

Your user account

A user account has attributes like user name, password, groups, etc. The groups are particularly important, because of the way they interact with file permissions.

The Apache Web server runs under a user account. Not yours, but a system account, like nobody, www-data or apache. Let's assume it's apache.

Users belong to groups. So, apache belongs to one or more *nix groups. Groups are like roles in Drupal. Sysadmins put users in groups, so they can work with the same set of files.

Each file belongs to both a user and a group. When Drupal uploads a file to sites/default/files, it's actually the Web server that's doing the work, so the file is owned by the user apache. Which group does the OS set as the group owner of the file? apache's primary group. So if the user apache's primary group is psacln, then the uploaded file's group owner is psacln.

<Note>

psacln is a group created by Plesk, a Web control panel like cPanel.

</Note>

By default, when users upload files, Drupal sets their group permissions to read/write. From Drupal's includes/file.inc:

$mode = variable_get('file_chmod_file', 0664);
...
chmod($uri, $mode)

That 664 is a bit mask that turns on group read and group write.

When your restoration script runs, it runs under a user account, like lisabeth. lisabeth has a primary group. Here's the important thing:

If lisabeth and apache both have the same primary groups, they can read and write each others' files (in sites/default/files, at least).

So, your scripts will work best if your user and apache are in the same group. If you have root access, you can make sure of that, with a command like:

usermod -g psacln lisabeth

<Note>

There are other ways of setting up permissions and groups so that everything works, but this mammal doesn't know what they are. Can somebody explain? Remember, there's sweet sweet karma in it. Plus, you get to ride around on a centaur for a day.

</Note>

Besides groups, the second user account attribute we care about is the home directory. When you log in as lisabeth, you are taken to lisabeth's home directory, in *nix this is typically /home/lisabeth. Commands you type are executed in the directory, unless you specify otherwise.

You can refer to your home directory with the tilde (~). So...

ls ~

... shows the files in your home directory, while ...

cd ~

... makes your home directory the current directory.

Let's assume that your home directory is the same as the project directory. That's a common arrangement. If there's a directory called public_html or www in your home directory, it's likely that your home directory and project directory are the same. (Unless you have multiple Drupal sites on the same account.)

Procedure overview

We'll need two scripts:

Create a snapshot of the files and database

Put the site offline.

Make a snapshot of the files in the main Drupal tree.

Make a snapshot of the files in the private file system.

Make a snapshot of the database.

Put the site online.

Restore from the snapshot

Put the site offline.

Erase the files in the main Drupal tree.

Restore the files in the main Drupal tree from a snapshot.

Erase the files in the private file system.

Restore the files in the private file system from a snapshot.

Restore the database from a snapshot.

Put the site online.

Then you can:

Set up cron to run the shell script.

Celebrate your achievement.

Get another Drupal tattoo.

Watch the musical episode from season 6 of Buffy.

Play with your dog.

Ride about in your Drupal chariot, waving to the users.

A directory for everything related to site restoration

Let's make a new directory for the site restoration function, and move the data files into it. The scripts will live there, as well.

The directory will be a child of the project directory. Recall that the project directory contains all of the files needed for your Drupal site, including the Drupal tree itself, and the private file system. Here is the project directory again:

Often, the directory isn't public_html. Mayhap you have a subdomain called demo, mapped to the directory demo, and put a demo site there. Use whatever directory is appropriate for your project.

</Note>

Run this command in the directory containing public_html, or whatever your Drupal site root is.

You can check the results. To see one screen at a time:

tar -tvf drupal_tree_snapshot.tgz | more

Notice that "public_html" is included in the path of each file. We need to know that, to extract everything correctly.

Press the space bar to go to the next page. Press q when you get bored.

If you want to check, say, whether the hidden file .htaccess is in the archive, try this:

tar -tvf drupal_tree_snapshot.tgz | grep .htaccess

The | (pipe) character separates *nix commands. It sends the output of the first command into the second. grep is an oft used text pattern matching utility. It will show just the lines matching the pattern.

<Note>

BTW, this command...

tree -ap

... will show you a directory tree. The a switch means "all" (like hidden files), and p means "show permissions."

This...

tree -ap > tree.txt

... will put the output into the file tree.txt. Print it out, and casually show it to your boss, to prove how geeky you are.

The tree utility isn't installed by default on all *nixen. For CentOS and similar, enter...

"mysqldump" means what you think: dump the database. The commands outputs the MySQL commands needed to recreate the database, exactly what we want. > drupal_db_snapshot.sql means to store the dump into the file drupal_db_snapshot.sql. If you forget this bit, the SQL commands will flash by on your screen. Flashy flash, hello, goodbye.

If you peek inside drupal_db_snapshot.sql, you'll see commands like this:

DROP TABLE IF EXISTS `actions`;

This is why we don't have to erase the old data in the database before running drupal_db_snapshot.sql. The DROP TABLE commands in drupal_db_snapshot.sql will erase the existing data for us. W00tful!

Now you have the third data file you need to regenerate your site: the database. Move it into the right diectory:

mv drupal_db_snapshot.sql restore_site/

Onward!

A script for making the snapshot

You'll restore your site from three files:

Archive file of the Drupal tree

Archive file of the private files directory (if you have one)

Database dump

It's a good idea to automate the creation of those files. If you want to change your snapshot, then you simply rerun the creation script.

Put the commands for making the files into a file called, for example, take_snapshots.sh:

The database snapshot is about 5.5M, which seems about right for my project. The private files are about 1/4M. Not much. Again, about right. The Drupal tree is about 43.5M. Lots o' stuff. About right.

So, we have a script that will take a snappy snap snapshot of our site.

The restore shell script

Now to write the *nix script that cron will call every so often, to restore the snapshot.

Here's what the script should do:

Put the site offline.

Erase the files in the main Drupal tree.

Restore the files in the main Drupal tree from a snapshot.

Erase the files in the private file system.

Restore the files in the private file system from a snapshot.

Restore the database from a snapshot.

Put the site online.

You may be asking yourself: "Self, why erase the existing files? Why not just write over them?" Because users may have uploaded new files while messing with your site. You need to erase everything in sites/default/files and in the private files directory, to make sure you kill those uploaded files.

It has a hidden file: .ghost. There are files and directories with different permissions, like settings.php. Drupal uses hidden files and files with various permissions, so having some in the fake tree with make for a better test.

Set the fake project directory as your current directory. E.g.:

cd /~demotest

Now, make an archive of public_html using a command we used above:

tar -cvpzf drupal_tree_snapshot.tgz public_html

You should now have a file called drupal_tree_snapshot.tgz. Move it into your script directory:

mv drupal_tree_snapshot.tgz restore_site/

Frosty! If you use the private file system, add some fake files to drupal_private, and tar that as well:

Backup your site

Drush to the rescue! Again.

drush ard -v -r ~/public_html

ard: tell drush to create an archive dump

v: verbose. Let's you watch.

r: tells Drush which Drupal installation to backup.

Drush creates an archive file with the files and the database in it. Drush will tell you where it put the archive file. You can check it, to make sure it contains what you expect. For example, the command...

tar -tvzf ard_file_name.tar.gz | grep sql

... will show you the names of all of the files in the archive that have "sql" in their names. (Of course, replace "ard_file_name" with the name of the file that Drush created on your system.) Because of the v (verbose) option, your should also see the size of each file.

One of the files should be an export of your database, with the extension .sql. It will probably be larger than most of the other files.

Try it

OK - are you ready? Do or die time. Let's run the final script.

<Note>

Being a Nervous Nellie, I made extra backups with my site's control panel, before proceeding. Yes, I know, I'm not bold. You know the saying:

There are old Webbers,

And there are bold Webbers,

But there are no old, bold Webbers.

(Adapted from Some Mothers Do 'Ave 'Em. The learning-to-fly episode.)

</Note>

Make a few changes to your site, so you'll know whether the script works. Add some files, change some content, whate're takes your fancy.

Now select the user account used to run the process. This will normally be the same one you used to do the work above.

Add a task:

Now you need tell cron two things:

The command to execute.

When to execute it.

The command is the one that runs the w00ty script you wrote:

~/restore_site/restore_site.sh

What about the when? Usually, you want to restore the site every couple of hours. Type */2 in the hours field, to run the script every two hours.

How do you know this works? You can get an email sent to you when the task runs. It's worth doing this for day or so, so you can make sure that it all works. Then you can turn the email off. Unless you like getting lots of emails.

You geek, you!

Get all this working, and treat yourself to a new mechanical pencil.

It helps users to show a timer, so they know when the next reset happens. That will be in a future post.