Search

How a Corrupted USB Drive Was Saved by GNU/Linux

My friend's brother had a 512MB Lexar Media Jumpdrive Pro USB drive that
became corrupted after using it with Windows 2000. His IT department was
able to get back some but not all of the file contents, but without any
file names. On his own, he tried some recovery utilities, but all failed.
Using a typical Linux distro--in this case SuSE 8.0--however, it wasn't
hard to recover almost all of the data from the drive along with the
filenames and to burn a CD-ROM of the contents.

USB Drive Ruined by Windows

Here's what I heard about the data loss:

Date: Sun, 1 Aug 2004 17:06:03 -0700
Subject: USB
... My USB drive is a
Lexar Media USB Jumpdrive Pro 2.0 (512 MB). I was working
on it in a computer with Windows 2000 and logged off before
ejecting the drive. Next time when I tried to use it,
it showed up as a Removable drive rather than the usual
Lexar Media drive and when I tried to open it, it said the
drive was not formatted; and under Properties, 0 bytes free
and used space and file system "RAW"
According to Lexar tech support, there is a bug with
Windows 2000 (that MS never bothered to fix) and can corrupt
the drive when it is removed without proper eject. They
recommend EasyRecovery Pro for data recovery which did
allow me to recover some files (> 500) using their RAW data
recovery program (all other tool failed because usually
said "no recognizable file on disc"). Unfortunately,
all the file names are lost and some files are gone.

The big questions was "can Linux read the drive?" A Web search of
"linux usb jumpdrive pro" gave me hope that my kernel, 2.4.18 on SuSE
8.0, would recognize the drive in question. So, as root, I typed:

# tail -f /var/log/messages

and plugged the drive into a USB socket. Here's what appeared; I
removed "Aug 5 01:32:15 linux kernel:" from each line below):

The boot sector has a reasonable-looking partition table with
one entry. It began at offset 0x1be, the two bytes 80 01. Your
favorite search engine can give you other information about
the partition table, but I note two things here. First, the
entry has an LBA32 format--starting logical sector 0x3f, length 0xf45b1.
Now, 0xf45b1 is 1000881 decimal. That plus 63 (0x3f) is 1000944.
The difference between the 1001952 and this 1000944 is 1008, that is,
63*16. I guess this has something to do with cylinder boundaries.
The second thing of note is the byte at 0x1c2, with value 06;
this is the partition type. What does 06 mean?

Now, if I had been watching carefully, I would have known from
the line sda: sda1 in /var/log/messages that the
partition table was okay and contained only one entry.

Finding the FATs

When I actually started looking, however, I wasn't really sure
if this was a FAT16 vs FAT12. The drive's capacity of 512MB
suggested it could be either FAT16 or FAT32. I also somehow had the
impression that the partition could have contained a FAT32 filesystem
in the same partition type. As I continued to look through the
filesystem, I noticed this:

On a side note, I recently discovered the hard way that CMD |
less doesn't do what you want it to if the output of CMD is too long.
In this case it was okay to use, but it isn't always; this probably is
system-dependent. If you have enough space on your hard drive, it may pay to do something
like this:

# od -Ax -w8 -tx1 -tc /tmp/r1 > /tmp/r2; less r2

or

# hexdump -C /tmp/r1 > /tmp/r2; less r2

So this looks like the start of a directory. Immediately above
that area, though, I saw this:

That looked like an allocation chain with 16-bit entries.
If these had taken the form 31 dd 00 00 32 dd 00
00 rather than 31 dd 32 dd, I
might have thought I was looking at FAT32.

I had heard somewhere that typically two FATs can be found together,
one right after the other. I told less(1) to find another line
resembling the line at 0x42460, by typing ?31 dd 32 dd 33
dd. In response, less(1) showed me this:

Luckily, this looked okay too. In fact, FAT#2 might be completely okay
even though the first 40KB or so of FAT#1 had been corrupted.

Repair Attempt #1

All of this has been interesting, but the point of this exercise
was to repair the filesystem and read the data. So I now turned to my
friend fsck for the repair work, in particular fsck.msdos, err and dosfsck(8).
I took the filesystem image and did what needed to be done with a spare
loop device:

# losetup /dev/loop2 /tmp/r1
# fsck.msdos /dev/loop2

But according to fsck.msdos(8), the "disk" claimed to have something
near 165 FATs, whereas fsck.msdos only supports two. Apparently, some
filesystem parameters were messed up severely.

Shortcut to Filesystem Repair

I started looking at the source code for mkfs.msdos, also known as
mkdosfs(8), but then came up with a better idea. What if I could create
a filesystem with the FAT parameters arranged so that the FATs and the
directory in this new filesystem were in the same place where the FATs
and directory were in the disk image I already had? The bytes that read
LEXAR MEDIA probably were the volume name. Maybe, by giving the right
parameters to mkfs.msdos(8), I could create a filesystem image wherein
0x08000 would point to the first FAT, 0x26a00 would point to the second
FAT and 0x45400 would point to the volume label.

Therefore, I specified -f 2 for two FATs and
-n mkfs__msdos--that is, a string I could find
easily--for the volume name. This way I could tell where the vol-name landed.

How about the other parameters? I saw above that the FATs
were 0x1ea00 bytes apart; if they landed the wrong distance from
each other, I could tweak -F and maybe -s. I found on-line that
for a filesystem of this size, the clusters would be 8192 bytes;
in other words, there would be 16 512-byte sectors per cluster.
The cluster is the file allocation unit described by the FAT. Hence, it
would be -s 16.

As for where to create the filesystem, it wouldn't do to put it
on the USB drive. Instead, I created a file the same size as
the drive image but filled with zeroes:

# dd if=/dev/zero of=/tmp/r2x bs=512 count=1001952

After creating the filesystem, I figured I'd mount it
and create a file. The file would have enough data in it
that we could see a reasonable allocation chain. To accomplish this,
I wrote a script and prepared to call it with parameters until I
happened to find everything where I wanted it. I called it b.sh:

My plan was to try running this script with different parameters
until I got it right. 0x8000 is 32KB. In 512-byte sectors, that's 64.
Because the first FAT started at 0x8000, I decided to try
-R 64, like this:

I didn't check the directory size, but it apparently it was okay as
well--more on that below.

Grafting Filesystems

I now had a boot sector that would tell fsck.msdos to expect
the FATs and the root directory at all the right places. So what if
I created a filesystem image where the first sector was that one,
but all the rest of the sectors contained data from the USB drive?
Then, fsck.msdos would read the boot sector; I'd tell it to use FAT#2
to repair everything; and we'd see how it turned out.

Repair Attempt #2

To summarize exactly what fixed the USB device:

Step 1: create a filesystem image of the right size, with
FATs and the directory in the right places:

Note: A good result to ls -lR showed that I was lucky
in one other way: I didn't know if the boot sector had a good value for
the size of the root directory, the -r parameter to mkfs.msdos. I simply
used the default and it turned out fine.

Burning CDs

At this point, I decided I had better burn a CD. I burn and read CDs all the
time on Linux, but I rarely burn CDs to be read by Windows. Again
I did a Web search, and a page from IBM's DeveloperWorks site turned up.
I had searched "linux burn CD windows" or something like that. So I tried this:

I wasn't 100% sure that Windows would like this CD, but fortunately I
have Windows95 under Win4Lin. Its sole purpose for me is to run Quicken
and TurboTax, but I fired it up and pointed Windows Explorer at the just-burned CD-ROM.
Explorer loved it. I used gimp(1) to capture a screenshot and
e-mailed the image to my friend's brother--he was ecstatic.

Line 1 identifies to exec(2) that this is supposed to be run by the
shell. I've become accustomed to bash, the Bourne again shell.

Line 2 simply explains line 3, that the parameters you type after
b.sh are parameters to add to the mkfs.msdos
command line.

Lines 4-6 establish /dev/loop2 as the block device whose contents
are in the filesystem image kept in /dev/r2x. Line 4 unmounts the
artificial filesystem if it was mounted; this is done because we're about
to make some changes to it. Lines 5-6 make sure that /dev/loop2
is connected to /tmp/r2x and only to /tmp/r2x.

Line 7 creates an artificial filesystem image with whatever additional
parameters the user gave--remember $ARGS from line 3?.

Line 8 mounts the filesystem onto /tmp/r2d. Line 9 creates
a file of about 24KB (three clusters), so I have a filename
to look for at the beginning of the directory.

Line 10 then unmounts the artificial filesystem image, so the
kernel does not think there are inconsistencies if I play with
/tmp/r2x.