Search

Disaster Recovery

Something has gone wrong. That's all you
know. Staring at your blank or garbage-ridden screen, the only
thing you can think is “Now what do I do?” Even if you have not had
this happen yet, there is probably a good chance you will face
this. With all of Linux's power, it is still rather easy for a
new—or even experienced—user to make a mistake and mess up
something.

With some advance preparation, this kind of situation won't
leave you stranded. Make sure you know how to track down a problem,
have a bootable disk, and have a set of rescue disks, configured
for your particular setup.

Your first step is tracking down the problem. Do you get to
the `Uncompressing Linux...' message? If not, your problem is with
the boot disk or LILO. Having a spare boot disk should allow you to
boot your system, and then you can reconfigure LILO or make a new
boot disk.

While Linux is booting, do you get past the partition check?
If so, your hard drives are probably fine with Linux. I had a hard
drive once that made Linux hang when it tried to find the
partitions. The drive didn't work in any other system I tested, so
the drive was bad.

Also, if you get past the partition check, then the kernel is
not your problem. After the partition checks are done, root is
mounted and then /etc/inittab is read. As you may or may not
recall, /etc/inittab is used by the init program to start login
processes and begins reading your /etc/rc files to mount your
partitions, start your network among other things. Once the inittab
is read, it goes to the corresponding file for mounting additional
filesystems, starting network services, and other startup services.
If you see your filesystems being mounted, that means that some of
your rc files are being started.

Once the inittab is read, it goes to the corresponding
startup file (“rc file”) for mounting additional filesystems,
starting network services, and other startup services. If you see
your filesystems being mounted, that means that some of your rc
files are being started.

Finally, make sure that your network services are starting if
you want them started on your system. This is one of the final
parts to the startup sequence.

Now, what do you do if you know you have a problem? Before
you get into a jam, make sure you have backups. If things get too
bad you can always re-initialize your partition and restore from an
old backup. Also make sure to have backups handy of your /etc
directory.

One good idea is to get a copy of the rescue disks available
through FTP. These disks will allow you to boot linux from a pair
of floppies and access most of your partitions. This way, even if
you can't boot because of a bad /etc/inittab file, you can still
boot linux and get access to the bad file, then fix it.

Some of these rescue disks come completely ready-made, so
that you can use the rescue disks very easily. The disadvantage to
these sets is that they may use an older kernel, may not have some
pieces that you need (SCSI support, for example), and may not have
the set of programs that you want to see in a rescue disk.

There are other sets of rescue disks where you specify which
programs you want to include. They also use the current version of
the kernel that you are using. The drawbacks to these are that you
need to know what you are doing and they take a bit more work than
simply getting a pre-built rescue disk. Two such packages are SAR
(Search and Rescue) and rescue. Each of these packages is small, as
they both use programs that are already on your system.

If you have two floppy drives, you can go through the rescue
disk(s) and find out what programs that you'd like to add, such as
your favorite editor. Usually one disk can contain all the programs
you'd need in the event of a disaster, but having two disks chock
full of utilities will be even better. Here's how:

First, put a floppy in your second drive. I have a 5.25 HD
drive as my second floppy, so I'll use that in my examples.

The fdformat program is used
to low-level format a floppy. Its syntax is:

fdformat <device>

where <device> is the name and type of drive you're
using. For example, I have a high density 5.25" drive as drive 2,
so my <device> would be /dev/fd1h1200. A high density 3.25"
would be /dev/fd1H1440.

Now you put a filesystem on it. Use the same filesystem that
you are using on the root partition of your system. In my case,
that would be the Second Extended Filesystem (ext2). So, let's put
a filesystem on my floppy:

mke2fs -c /dev/fd1h1200

Replace the /dev/hd1h1200 with /dev/fd1H1440 if your second
drive is a 3.5" high density drive.

Now you should have a filesystem on a disk. Mount it on an
unused directory. The /mnt directory is usually used for this. If
/mnt does not exist on your system, do

mkdir /mnt then do mount -t ext2 /dev/fd1 /mnt

Your disk will now be mounted on /mnt. At this point, start
copying over whatever programs you want. Make sure of two
things:

Make sure that the shared libraries on the rescue
disk will work with the programs that you put on the disk.

Make sure that you copy over all the files you
need. Some editors have configuration files or help files you may
need.

If you are using a rescue disk such as SAR or rescue, you
won't need to worry about libraries and you can skip ahead a few
paragraphs. Or you can read it and get a better hint about how the
shared libraries work.

The idea behind shared libraries is that many common C
functions get included in one file in a common location. This saves
a lot of space as those common functions no longer need to be
duplicated in each program binary. The drawback is that it is a
tiny bit slower because now two files have to be loaded instead of
one. For the toss-up between speed and size, I'll take the size,
especially on a floppy with very limited space.

Another small problem with shared libraries is that programs
compiled to use a new library won't work if the only library that
is available is an older one. For example, a program compiled to
use version 4.4 of the libraries won't work if the only set of
libraries available is version 4.3. You'll wind up getting an error
message about incompatible libraries. If this happens, get a new
copy of the libraries or recompile the program to use an older
library.

[Ed. Note: this is not strictly true. With modern libraries,
the user will get a message, but the program will still try to run
if all the necessary symbols are there. For instance, I'm running
some binaries compiled under libc 4.5.8 which run fine with my libc
4.4.4, other than giving an error message. I don't know if you want
to deal with this or not; probably not.]

To check what versions of libraries the programs are looking
for, use the ldd command:

ldd <program>

This will return the version of libraries that the program
was compiled under. ldd /bin/write for me
returns:

libc.so.4 (DLL Jump 4.4pl1)

If the files in the /lib directory are libc.so.4.4.1 or
above, it will be fine to put the `write' command on your disk. If
the library needed is newer than the library on the rescue disk,
then you would need to find an older version of the program and put
that on the floppy. For example, if the library on the rescue disk
was libc.so.4.3.1, I'd need to find an older version of write to
put on the disk, or else put libc.so.4.4.1 on the disk.

You don't need to put just executables on this disk. A copy
of gzip and a bunch of HOWTO files can come in quite handy as well.
Here's a list of suggested files, all available through FTP or on
many BBSs. Some of these files may be on the rescue disk you have.
Make sure.

Take any of these editors. I find that ed is small and
compact, but not much fun with heavy editing or large files. For
you, joe may be worth the extra 98k it takes up. If you are
unfamiliar with joe or ed, you can use vi, which is a standard
program on just about all UNIX systems:joe editor 133kvi editor
101ked editor 35k

One more thing you'll want on-hand is a list of all of the
cards that are in your machine, the IRQs that they use, and whether
they are used by Linux or not. Sometimes a problem can be an
incorrectly configured kernel or card.

If you keep these disks set aside and updated often, you'll
be ready for anything that might happen.

Tip of the month: When you hit the backspace, do you see /'s
followed by the character you just backspaced over? Don't you hate
it, too? It reminds me of reading The Unix Programming
Environment. Get a new copy of agetty and this should
cure the problem. A copy distributed with some Slackware releases
had this problem.