Standard: Use the information that follows at your own risk. If you screw up a system, don’t blame it on me...

NOTE: A more up to date version of this doc is available on the net. Search
for phrase “when good disks go bad” and you should find an HP white paper
that contains ia64 information as well as the pa-risc information below.
This doc kept primarily for the lv commands:

Despite all the effort various HP disk divisions put into making highly
reliable disk drives, occasionally a disk mechanism goes bad and needs to be
replaced. After the old disk is replaced with a new one (retaining the
hardware address of the original to avoid confusion), the data needs to be
restored to that disk from a backup. Before LVM, in the simple world of hard
sectioning, this meant checking your notes for the original sectioning
scheme, then recreating file systems as necessary on those sections, and
restoring the data from the backup.

With LVM, the picture is slightly more complicated. That one disk could have
various pieces of several logical volumes on it. The layout of those logical
volumes (LVs or lvols) must first be restored, and the data for each of those
LVs restored from backup.

This document endeavors to provide a step by step guide to replacing a
faulty LVM disk, outlining the general sequence of commands required to
perform the task. It is divided into 4 chapters with 1 appendix as follows:

At the start of each of the chapters there is an example system
configuration, which is referred to in the examples for each of the steps.
Use these example systems to understand the process, but please be careful
to issue the required commands relevant to the system actually being
recovered.

Of particular importance is Appendix A, which outlines the steps that must
be performed PRIOR to the disk failing. Please make sure that you read this
section carefully, and implement the required procedures as soon as possible.
Your system recovery may rely on these steps. It is recommended that you
familiarize yourself with the procedures outlined in this document prior to
the time of ever needing them so that you understand fully the steps involved.

If you have any questions about the recovery process, please contact your
local Hewlett-Packard Customer Response Center for assistance.

The scenario for this chapter is that the disk at hardware address 52.4.0
has a head crash, and as a result it is unusable. The steps below outline a
method that can be used to recover from this state.

[Step 1.1]

Have the engineer replace the faulty disk, and then boot the system in
single user mode. This will ensure that when we are working with the system,
there will be a minimum of processes running on the system. To boot into
single user mode, boot from the primary boot path, and interact with IPL. At
the ISL> prompt, find out what the original boot string was:

ISL>lsautofl

This will return a string such as:

hpux disc3(52.6.0;0)/stand/vmunix
OR
hpux (;0)/stand/vmunix

The output here is dependent on the type of system you have. Once you have
this information simply add the string -is after the hpux, and this will
boot the system into single user mode. For our example:

ISL> hpux -is (52.6.0;0)/stand/vmunix

[Step 1.2]

Restore the LVM configuration/headers onto the new disk from your backup of
the LVM configuration:

# vgcfgrestore -n <volume group name> /dev/rdsk/c0tXd0

where X is the Logical unit number of the disk that has been replaced. For
our example:

# vgcfgrestore -n /dev/vg00 /dev/rdsk/c0t2d0

NOTE: You must have performed the command vgcfgbackup to save off the
headers prior to the disk failure ( Refer to Appendix A ).

[Step 1.3]

Reactivate the volume group (VG) so that the new disk can be attached,
since it wasn’t configured in at boot time:

# vgchange -a y <volume group name>

For our example, the volume group vg00 will already be activated, but it
will not know of the replaced disk. Therefore, this step is still required
so that LVM will now know that the disk is again available:

# vgchange -a y /dev/vg00

File systems usually mounted on lvol 5 and 6 will not have been mounted
since we booted in single user mode. This allows us to activate the VG
without concern for a process trying to access the lvols once they were
activated on the new drive. vg00 will be able to access them, but they are
currently void of data or file systems until we restore a backup to them.

NOTE: vg00 is always activated at boot time because it holds the root,
primary swap, and dump partitions. However, other VGs will not have been
activated yet. This is not a problem, as the vgchange command will work on
these in the same way. In the case of vg00, it would initially have been
activated with c0t2d0 in an unknown state. vgchange tells vg00 to look
again at c0t2d0, which is now in a known state.

[Step 1.4]

Determine which logical volumes spanned onto that disk. You only need to
recreate and restore data for the volumes that actually touched that disk.
Other LVs in the volume group are still OK.

# pvdisplay -v /dev/dsk/c0tXd0

will show a listing of all the extents on disk lu X, and to what logical
volume they belong. This listing is fairly long, so you might want to pipe
it to more or send it to a file. For our example:

From this we can see that logical volumes /dev/vg00/lvol5 and
/dev/vg00/lvol6 have physical extents on this disk, but /dev/vg00/lvol1
through /dev/vg00/lvol4 don’t. So, we will need to recreate and restore
lvol5 and lvol6 only.

NOTE: Even though lvol5 was also in part on another disk drive, it will
need to be treated as if the entire lvol was lost, not just the part on
c0t2d0.

[Step 1.5]

Now, restore the data from your backup onto the replacement disk for the
logical volumes identified in step 1.4. For raw volumes, you can simply
restore the full raw volume using the utility that was used to create your
backup. For file systems, you will need to recreate the file systems first.
For our example:

For HFS:

# newfs -L /dev/vg00/rlvol5# newfs -L /dev/vg00/rlvol6

For JFS:

# newfs -F vxfs /dev/vg00/rlvol5# newfs -F vxfs /dev/vg00/rlvol6

Note that we use the raw logical volume device file for the newfs command.
For file systems that had non-default configurations, please consult the man
page of newfs for the correct options. Then, mount the file system under the
mount point that it previously occupied. Once this is done, simply restore
the data for that file system from your full backups.

As we are in single user mode, there is no need for concern that a process
or user will try to access the file system prior to the restore operation’s
completion.

Note: you will need to have recorded how your file systems were originally
created in order to perform this step. The only critical feature of this
step is that the file system be at least as large as before the disk
failure. You can change other file system parameters, such as those used to
tune the file system’s performance.

For the file system case, there is no need to worry about data on the disk
(c0t1d0) that was newer then the data on the tape. The newfs wiped out all
data on the lvol5. For the raw volume access, you may have to specify your
restore utilities over-write option to guarantee bringing the volume back to
a known state.

[Step 1.6]

The final step in the recovery process is to reboot the system. When the
system restarts, the recovery process will be complete.

# cd /# shutdown -r

If you have any questions or problems with the above steps, please contact
your local Hewlett Packard Customer Response Center.

Mirroring introduces an interesting twist to the recovery process. Because
LVM keeps a map of “stale” extents on each disk, it is only aware of
individual extents which are in need of update, and it does not map this to
entire disks. This makes for quick mirror recovery in the case that a disk
has temporarily lost connection with the host, or has lost power. In
addition, it can greatly speed up the recovery time in the instance of a
failed disk.

Example configuration:

Volume group /dev/vg00 contains the three disks, with the logical volume
configuration as shown:

Shutdown the system, have the customer engineer replace the faulty disk, and
then boot the system. You can boot the system into either single or
multiuser mode, depending on whether you need to provide access to your
users while the recovery procedure is being performed. For file systems that
have mirror copies on the replaced disk, the file system can be used by the
users during recovery. For file systems that didn’t have a mirror, and that
resided on the replaced disk, you will have to deny access to the users.
Do this by unmounting the relevant file systems ( they will possibly not be
mounted as part of the boot up sequence anyway ).

[Step 2.2]

Restore the LVM configuration/headers onto the replaced disk from your
backup of the LVM configuration:

# vgcfgrestore -n <volume group name> /dev/rdsk/c0tXd0

where X is the Logical unit number of the disk that has been replaced.
For our example:

# vgcfgrestore -n /dev/vg00 /dev/rdsk/c0t2d0

NOTE: You must have performed the command vgcfgbackup to save off the
headers prior to the disk failure ( Refer to Appendix A ).

[Step 2.3]

Reactivate the volume group so that the new disk can be attached, since it
wasn’t configured in at boot time. This will also resync any mirrors that
resided on the faulty disk.

# vgchange -a y <volume group name>

For our example, the volume group vg00 will already be activated, but it
will not know of the replaced disk. Therefore, this step is still required
so that LVM will now know that the disk is again available and the resync
will occur:

# vgchange -a y /dev/vg00

[Step 2.4]

For any file systems on the faulty disk that didn’t have mirror copies, you
will have to rebuild the file systems and restore the data. Follow the steps
1.4 and 1.5 in chapter 1 for guidance here.

[Step 2.5]

If you booted your system into single-user mode in step 2.1, reboot your
system now and allow it to boot into multiuser mode. If you were already in
multi-user mode, then no further action is required.

With the failure of the boot disk, you will have lost the information on
disk that is required to be able to boot the system. So, you will need to
install at least a minimal system onto the boot disk, and then restore your
original kernel and operating system files from your backup.

Example configuration:

Volume group /dev/vg00 contains the three disks, with the logical volume
configuration as shown:

The scenario for this chapter is that the disk at hardware address 52.6.0
has a head crash, and as a result is unusable. Our example is a worst case
example where /usr ( lvol3 ) spans over from the faulty disk to another one.
lvol3 has 300 Mb on the disk at 52.6.0, and another 100 Mb on the disk at
52.5.0. Our root logical volume was installed with 104 Mb (lvol1), and swap
(lvol2) of 48 Mb. The aim of the steps below is to allow a recovery in which
you will have to recover the data for logical volumes 1, 2 and 3, but not
for any of the other logical volumes in the volume group. They all reside on
disks that are still OK, so we shouldn’t have to touch their data.

COPYUTIL USERS:

If you used COPYUTIL to create a disk image, you will not have to do the
following steps, except for Step 3.6 to restore your most recent data.
Additionally, if you have added a file system/lvol since using COPYUTIL, you
will also need to do Step 3.12 and Step 3.13.

Instead, you can restore your disk image. For instructions on how to do
this, refer to the “Support Media User’s Guide”, part number 92453-90010.

[Step 3.1]

Determine the original LVM configuration you had on your system. In
particular, you need to know:

size and layout of all logical volumes in the root volume group

extent size used for the root volume group

All of this information should have been recorded by yourself when the
system was being configured. Refer to appendix A of this document for
recommended configuration backup procedures. Without this information, the
following steps are not possible (a full reinstall is required for the root
volume group).

[Step 3.2]

Have the customer engineer replace the faulty disk, and then perform an
install according to chapter 2 of the “Installing HP-UX 10.01 and Updating
from HP-UX 10.0 to HP-UX 10.01” manual, part number B2355-90078. When
performing the creation of the volume group, make sure that you specify the
SAME sized logical volumes as you originally had on your system. If you had
any logical volumes that spanned from the root disk across to other disks in
the volume group, simply create the logical volume the size it has on the
root disk. For our example, we will have to install with the following
parameters:

Root logical volume

104 Mb

lvol1

Primary swap

48 Mb

lvol2

/usr

300 Mb

lvol3

Note here that /usr on our original system is actually 400 Mb, but we are
creating it here with only 300 Mb, which is the size of the logical volume
on the root disk.

Note: Only include the boot disk in the LVM configuration when you are doing
the install. If you include any other disks, you risk overwriting valid data.

[Step 3.3]

When you reach update stage of the install, you do not need to load all the
filesets that you had on your original system. You will restore these from
your full backups at a later stage (Step 3-6). Recommended products to be
loaded are:

Other filesets may be automatically selected due to dependencies between
filesets.

[Step 3.4]

Once the install has completed, the system will have booted in multiuser
mode. To proceed with the recovery, shut the system down and reboot into
single user mode, ensuring the minimum of system activity during the
recovery. Refer to step 1.1 in chapter 1 for instructions on how to do this.

[Step 3.5]

The next step is to install any patches that repair problems in your restore
utilities ( for example, frecover patches, or a patch for SAM if you use
SAM to do your backups/recoveries ). These patches should be kept on
separate, well marked tapes so as to make their installation smooth. Perform
these installs as per the instructions provided with the patches. The reason
for doing this is to ensure that you are able to cleanly restore your files.
You will not need to install any other patches at this stage, as they will
be recovered as part of the next step.

[Step 3.6]

Now, restore your original root file system from your full backups. At this
stage, you will only need to restore the files on the root file system, not
any other file systems (e.g. /usr ), as they will be restored at a later
stage. First, restore your /etc/passwd and /etc/group files from your
special backup, and then proceed with the restore of the other files from
your backup. For example, if you use fbackup to back your system up,
therecovery command could look like:

Note that you should specify OVERWRITE, as you need all the files on the
backup to REPLACE the files on disk.

[Step 3.7]

At this stage, you are still booted from your newly installed kernel, but
have your original one restored to disk. So, you should reboot, allowing the
original kernel to be loaded into memory. At this stage, you should boot in
maintenance mode, so that the LVM configuration can be recovered. To boot in
maintenance mode, follow the same procedure as outlined in Step 1.1,
shutting down with the shutdown command, and then interrupting the boot.
However, this time insert the string -lm ( instead of the -is ).

For our example:

ISL> hpux -lm (52.6.0;0)/stand/vmunix

[Step 3.8]

Restore the LVM configuration/headers onto the replacement disk from your
backup of the LVM configuration:

# vgcfgrestore -n /dev/<vgname> /dev/dsk/c0tXd0

where X is the Logical unit number of the disk that has been replaced.

For example:

# vgcfgrestore -n /dev/vg00 /dev/dsk/c0t0d0

NOTE: You must have performed the command vgcfgbackup to save off the
headers prior to the disk failure.

[Step 3.9]

Reactivate the root volume group so that the LVM configuration can be
brought into effect.

# vgchange -a y /dev/<vgname>

For our example:

# vgchange -a y /dev/vg00

[Step 3.10]

Update the BDRA ( Boot Data Reserved Area ) of the boot disk to ensure your
system will be able to boot:

# lvlnboot -R /dev/<vgname>

For our example:

# lvlnboot -R /dev/vg00

[Step 3.11]

Shutdown the system, and reboot in single user mode. Refer to step 1.1 from
chapter 1 for directions. We need to do this because we are currently in
maintenance mode.

[Step 3.12]

Determine which logical volumes resided on the faulty disk that we haven’t
restored back to their original state. The root logical volume (lvol1) and
primary swap (lvol2) should be correct as you created them the correct size
at install time. However, other file systems will need to be recreated and
then the data must be restored. To determine which file systems need further
work:

# pvdisplay -v /dev/dsk/c0tXd0

will show a listing of all the extents on disk lu X, and to what logical
volume they belong. This listing is fairly long, so you might want to pipe
it to more or send it to a file. This listing will show you all the logical
volumes that reside on that disk. You will need to rebuild and restore all
the logical volumes other than the first two logical volumes (lvol1 and
lvol2).

We can see in our example that we will need to reconstruct lvol3 ( /usr ).
One thing to note is that when we did the vgcfgrestore in step 3.8, we told
LVM that lvol3 is again 400 Mb (as per the original configuration) and yet
the file system will still think it is 300 Mb. This will need to be remedied.
However, it is not a problem at this time: a file system does not need to be
as large as the lvol it resides in.

Note: Make sure that you don’t touch any logical volumes that do NOT have
any extents on the replaced disk. These logical volumes should still be
whole, so do not need recovering.

[Step 3.13]

Now, restore the data from your backup onto the replacement disk for the
logical volumes identified in step 3.12. For raw volumes, you can simply
restore the full raw volume using the utility that was used to create your
backup. For file systems, you will need to recreate the file systems first.

Note that we use the raw logical volume device file for the newfs command.
For file systems that had non-default configurations, please consult the man
page of newfs for the correct options. Then, mount the file system under the
mount point that it previously occupied. Once this is done, restore the data
for that file system from your full backups.

[Step 3.14]

To play it safe, shutdown your system and reboot into multiuser mode to make
sure that everything is correctly restored. At this stage, your system is
recovered in full.

After having to implement the procedures that are in the original
Chapter 4: replacing a boot disk WITH mirroring, I found a problem.
Working with HP response center, I found that the directions are in error.
The correct procedure is as follows:

If possible, remove vgreduce the mirrored disk out of the root volume
group by:

lvmreduce’ing each mirrored logical volume so it’s using the
functioning boot disk.

Once all the mirrors are broken, vgreduce vg00 ${bad_disk}

Boot to single user mode as described in step 4.1

Execute the vgcfgrestore as described in step 4.2

Execute the mkboot’s as described in step 4.2 -
NOTE: DO NOT execute the pvcreate command

Shutdown the system, have the customer engineer replace the faulty disk, and
then boot the system in single user mode from the alternate boot disk. If
you only have two disks in the root volume group, then you will need to
override quorum as you boot.

Restore the LVM configuration/headers onto the replaced disk from your
backup of the LVM configuration:

# vgcfgrestore -n <volume group name> /dev/rdsk/c0tXd0

where X is the Logical unit number of the disk that has been replaced. For
our example:

# vgcfgrestore -n /dev/vg00 /dev/rdsk/c0t0d0

NOTE: You must have performed the command vgcfgbackup to save off the
headers prior to the disk failure ( Refer to Appendix A ).

[Step 4.3]

Reactivate the volume group so that the new disk can be attached, since it
wasn’t configured in at boot time. This will also resync any mirrors that
resided on the faulty disk.

# vgchange -a y <volume group name>

For our example, the volume group vg00 will already be activated, but it
will not know of the replaced disk. Therefore, this step is still required
so that LVM will now know that the disk is again available and the resync
will occur:

# vgchange -a y /dev/vg00

[Step 4.4]

If you have any logical volumes that resided on the faulty disk that were
NOT mirrored, you will need to recreate them as per steps 1.4 and 1.5 from
chapter 1.

[Step 4.5]

Bring system up into multiuser mode, allowing users on to access system. To
do this, find out your normal run level ( first line of the /etc/inittab
file ), and then perform a telinit command:

These are procedures to ensure that the system’s data and configuration are
recoverable in the event of a system failure.

Load any patches for LVM.

Regularly back up your entire system.

Without a valid backup, you run a real chance of losing some or all of your
data. Ensure that you back up ALL of your important data, including the
operating system directories such as:

/
/usr
/dev
/etc and so on.

In addition, regularly test that your backups are working by restoring a
test file randomly from your backups. It is risky to assume that your backup
is working because it is not logging any errors. Many backup utilities have
the capability to do some sort of validation of the backup media. For
example, fbackup has the -N option that can allow you to check for
discrepancies between backup indices and what is actually on the tape. Refer
to the fbackup(1M) man page for more information.

Use COPYUTIL, if possible, too.

Back up the important files separately.

Take an extra copy of the very important files, preferably to another system
as well as to another tape. This will speed up recovery in the event of a
system crash. The files that should be backed up are:

/etc/passwd
/etc/group
/etc/lvmtab
/etc/lvmconf/*
/etc/fstab

There are many other important files on your system that you may wish to
back up separately. The files listed above are required to ensure a smooth
system recovery.

Regularly print out the configuration of your system.

The configuration details stored on the system may not be accessible during
a recovery. A printed copy is an invaluable reference. We recommend printing
the configuration details once a week and every time a change is made. One
thing to note is that some of the commands outlined below create large
amounts of output. An alternative to printing them is to output the
information to a file and then storing the file off to tape. This allows
quick recovery of the information when needed. You could include this
configuration file with the backup in step 3.

The easiest way to save the configuration is to set up a cron job to run
regularly, so that even if you don’t remember to do it, the system will.

As an alternative, an intelligent script can be written that will detect any
changes in the configuration and only print out those changes. An example
script is included at the end of this appendix.

Backup the LVM configuration after every change.

The “vgcfgbackup” command copies the LVM headers from the system area of the
disk to a disk file, which by default resides in the /etc/lvm.conf directory.
Once this information is in a disk file it can be stored to tape during
backups of the file system. This information in this file allows you to
replace the LVM headers on the disk in the event of the disk being replaced,
or if your LVM configuration becomes corrupted.

It is very important that you make these configuration backups whenever you
make a change to any part of the LVM configuration.

This is another task that should be done on a regular basis, whether you
have made changes or not. It can be done with a cron job, just prior to the
time of a normal backup. The command to use is:

/etc/vgcfgbackup /dev/vgXX ( for every volume group )

Update the boot structures after every change to the root volume group.

This task is only required if you are using LVM on your boot disk. Whenever
you make changes to the root volume group, which is usually named /dev/vg00,
you MUST update the Boot Disk Reserved Area (BDRA) on the boot disk. To do
this issue the following command: