Unfortunately, not all system files have an include feature. Of
these, crontab is the most significant. In this case you can still
store the configuration data under /proj but you will need to cut
and paste it into the appropriate file as part of your recovery procedure.

Regardless of how you incorporate the information, you need some way of
distinguishing your changes from the default configuration. My preference is
to be explicit and to bracket any changes with comment lines like these:

## Craic modification BEGIN
[...]
## Craic modification END

This also gives me tags that I can look for, should I want to uninstall the
changes or check whether they are already in place.

You should store all of the included files on the /proj
partition. You could create a configuration directory for each application but
my preference is for a single directory for all the application settings. In
the examples above, this is /proj/linux_config. Within that I have
subdirectories for httpd, samba, mysql,
and so on. Bringing all the configuration data for all my applications together
allows me to manage their interactions more readily than would separate
directories. Additionally, I can refer to that single directory in the recovery
plan, which is reassuring for our friends in IT.

The Installation Script

The disaster recovery plan people really, really want installation scripts
for your applications. These scripts should wrap up all the messy details and make the
recovery process look much more like the Windows applications they are used
to.

By implementing the ideas given above, we can now give them what they want.
All the script needs to do is insert the include directives, or blocks of code,
into the system configuration files and restart the appropriate daemons. Make
sure that your script tests whether the changes have already been made and then
creates a backup copy of the system file before doing anything else. Here is an
example block of code from an install script, written in bash,
which inserts an include directive into the Apache configuration file.

Using comments to delimit each block makes it easy to excise the
modification as part of an uninstall script. This is extremely useful during
the testing phase of your disaster recovery plan.

Why not just create an RPM or other package for each application? That would
be great but it involves quite a bit of effort for an application that will
only ever be installed at a single site. To my mind, a simple shell script is
at the right level of complexity. Should you choose to go the extra mile and
create an RPM installation package, then the ideas described here should serve
as a good foundation.

Documentation

We all strive for good documentation but, if we are honest, we don't usually
do a very good job of it, especially when our applications are still under
active development. So how much documentation is enough in the context of
disaster recovery?

Our friends in corporate IT want to know what an application does, where the
software and data for it resides, what other software it depends on, and exactly what
to do to rebuild it. They want this in a form they can print out and put in a
binder on the shelf in the computer room, along with a copy in off-site
storage.

I need something different. I want to see ReadMe files
scattered throughout the directories that tell me what the files represent,
what the scripts do, and what other parts of the application and system they
interact with. These make up the informal map that another developer can use to
decipher my work if I get hit by a bus. Perhaps more importantly, they will
refresh my memory three years from now when I need to make some changes.

Don't be shy about what you put in the ReadMe files. If part of
the application is a complete hack that depends on an obsolete Perl module, or
if you know it will crash next time we have a leap year, then say so. You don't
need to tell the world about the gory details but you do need to capture that
information in a form that you or another developer can access. In the midst of
a real disaster recovery, those notes can make all the difference to someone
trying to fix things.

Please remember, ALWAYS put a date and a name next to your comments. A
problem that required a huge workaround last year might well have been fixed in
the current release of the operating system. This is a widespread problem with
Linux HOWTOs and web sites. A date helps me assess whether the information is
still relevant and a name gives me someone to contact if I need more
information.

If you can create and maintain this level of documentation as you develop
the application then it is not too much effort to rework it into the form that
the good people in corporate IT are looking for.

Use DNS Aliases for Multiple Applications

If you have several applications and multiple servers then you should
consider setting up Apache virtual hosts for each of them along with DNS
aliases that relate each application to the physical host.

For example, let us say I have two machines (server1 and server2) with the
application app1 on server1 and app2 on server2. The
default way to access the start page for each application would be to use the
URLs http://server1/app1 and http://server2/app2.

If server1 blows up then I either need to replace it or move the
app1 application to server2. But then all my users will have to
update their bookmarks to point to the new machine. The better alternative is
to create the hosts app1 and app2 as DNS aliases that
point to server1 and server2 respectively. In the Apache config for each server
I create virtual hosts for ALL of these applications, in essence replicating
their configuration even though the application itself may not be present.
Users now access the applications as http://app1 and
http://app2. The IP address in the DNS alias dictates which
machine the user is directed to.

If I need to move either application to another machine I simply install the
software, set up the virtual host, and then change the DNS alias to point to the
new machine. All the existing bookmarks and links continue to work. Users are
none the wiser to the change in venue.

Mirrored Servers

Live replication of applications and data is a great way to ensure that your
applications are available. Rsync lets
you maintain duplicate copies of directories on different machines, with
regular updates. The directory layout I've discussed here fits in perfectly
with rsync's abilities. MySQL replication can take that one step further with
live mirroring of the contents of a database to another machine. The setup is
more involved than rsync but it can be well worth the effort.

Be careful not to confuse high availability with disaster recovery. Mirrored
servers will do a great job at replicating your data, good or bad. Replication
can easily result in two corrupt databases instead of one. Things are different
with the high-end commercial databases that you'll find in the banks, but
that's not what we dealing with here. On our level, replication is great but
nothing beats having a tape on a shelf in off-site storage.

Final Thoughts

Disaster recovery planning should be just as important to developers as it
is to corporate IT. While the cultural differences between "us" and "them" can
be frustrating, we need to address their needs head-on if our style of
application is going to find a place in their world.

By designing our apps with disaster recovery in mind right from the start,
we become an ally of corporate IT rather than a thorn in their side. A little
bit of forethought pays big dividends.

Robert Jones
runs Craic Computing, a
small bioinformatics company in Seattle that provides advanced
software and data analysis services to the biotechnology industry.
He was a bench molecular biologist for many years before
programming got the better of him.