Recovering data from a damaged hard drive partition

For the past two days, I've been trying to recover data from a damaged hard drive. The drive got damaged when a squirrel was running along the above ground power lines, and caused an explosion when it shorted the circuit. This of course stopped the computer (and the entire neighborhood for that matter). And as occasionally happens in this type of circumstance, the drive was damaged. The partition with business data was no longer recognized - er, well, we could tell it was there, but couldn't mount it.

Lesson number one - make sure you have your data backed up.
Lesson number two - how to recover data in this type of circumstance (when a backup is not available).

It IS possible to recover almost the full drive in a situation like this. The problem is usually a few bad blocks, which happen to be in a bad area of the drive (i.e. the blocks contain partition or filesystem information, rather than just plain data). Here's the process I usually follow:

Do NOT run any of the filesystem repair utilities. The problem in this case is not the file system, it's the physical media. Trying to repair the file system might cause even more damage.

Find someplace to store the contents of the partition in question. This would be uncompressed space. So if the partition is 100 GB, you will need 100 GB of storage space available to you, and NOT on the damaged drive. This step usually involves hooking up another hard drive, and making sure it's working and mounted.

Create a byte image of the damaged partition. There are three tools that I know off that can do this (without exploring specialized recovery systems).

dd - this is usually available with a default install of Linux.

ddrescue - this software is more or less a wrapper for the appropriate dd commands (easiest explanation I can think off)

dd_rescue - this is very similar to ddrescue (notice the underscore difference), except that it will ignore errors and continue processing.

Of these, I prefer dd_rescue, which is available from FreshMeat. I find its relatively straight forward to use, and I don't need to mess with arcane parameters to make it do what is needed. But, newer versions of dd can also ignore errors if you tell it to.

The first snag is that you can run out of drive space if you have misjudged the storage requirements. If you inadvertently put the image file on the root partition, and you run out of space, you might see the box crash until you can free up that space somehow.

The second snag is the file size. Some systems will limit the maximum size of a file to 2 GB. You can fix this with the ulimit command, or modifying the /etc/security/limits file (or /etc/limits). However, sometimes the system is already setup to avoid this, but you still get errors complaining about the file size. This happened to me when I was creating a large file on a Samba share (same as a windows share), and an NFS share. Apparently Samba and NFS need to be compiled with large file support. Or you can just create the image on a local drive.

Mounting the Image

If you are not using Logical Volume Manager (LVM), then you can mount the easy way:

/the/mount/point is simply an existing empty directory. If you need to specify the filesystem type make sure you replace the ext3 with whatever is appropriate for damaged partition - ext2, ext3, reiserfs, etc.

At that point you can change into the /the/mount/point directory, and access the files as if the drive was operating fine.

With LVM

If the partition you are trying to recover data from is an LVM partition, then the process is different. First, you need to have the LVM management tools installed, which may mean a change to the kernel. I would also recommend disconnecting the suspect drive before continuing - just to avoid any inadvertent changes to it.

losetup -a - this tells what loop devices are already in use. We need to find a loop device that is NOT in use. (i.e. a number that is not listed).

losetup /dev/loop2 /the/image/file - this will treat the image file as a loop device, and assign the loop device number (in this sample, it would be loop device 2). Chang the number to be something that is not already running - i.e. /dev/loop5.

vgscan - the LVM system will scan the devices, including the loop devices, for volumes and report what is there. We want to make sure the volume on the image file is seen.

vgchange -a y - this will activate the volume(s).

vgdisplay -v - this will list the known volumes. We are interested in the "LV NAME" value for the suspect partition/volume. We use this value in the mount command.

mount -t ext3 /dev/VolGroup00/LogVol01 /the/mount/point - this will mount the volume at the given mount point.

At that point you can change into the /the/mount/point directory, and access the files as if the drive was operating fine. The mounting process is the only part that is different for LVM.

Copy the files

Once the image has been mounted, we can execute any standard commands on the files. I usually tar the desired files onto another partition (on a different hard drive than the damaged one). Be aware though that some of the files may be corrupt and unrecoverable then. (short of sending the drive to a professional data recovery company.)

Conclusion

The first defense against a damaged drive is a good backup. But sometimes the backup routine fails, or maybe doesn't even exist. In these cases, it is still possible to get some of the data from a damaged drive. I've used the steps above a few times now and have been pleasantly surprised (and very lucky) to recover as much as I did. I say lucky, because in one case my archived email was corrupted, but the rest of the drive was fine. There can be no guarantee where the damaged blocks will be, or what files they were part of.

If your drive is still running, and the partions are recognized but may be damaged, then there is still some hope of recovering the data from the drive.