Database Scrambled. No Backups. What Now?

My "Never Again" moment occurred back in 1984. I was a second-level support specialist for Digital Equipment Corp. In those days, computers were the size of refrigerators, PCs were toys that high school kids played with late at night and Al Gore hadn’t yet invented the Internet.

There was a DEC VAX system in
an upper floor of an old building in Anderson, Ind., that handled inventory tracking for a large electronics manufacturer. This company supplied
ignition parts and other components for one major customer that operated automobile assembly lines all over the world. Without these components, the assembly lines would grind to a halt.

One day, the aging hard disks inside this VAX system suddenly started to spray random garbage all over the place. This was really bad for the MUMPS database that handled the company’s newly developed "Just In Time" shipping process.

Even worse, MUMPS was a scripting language and a database all rolled into one. The code and database were together inside a single giant file, and it was this file that was corrupted. MUMPS files had an index that pointed to all the pieces of all the scripts and data. The index was destroyed. So not only was the database lost, the programs with the logic to manipulate the database were
also gone.

Naturally, the company turned to its backups. Funny thing, though -- nobody did backups on this system -- ever. The operations group thought the developers handled backups. The developers thought operations did the backups. Only when it was too late did the two groups actually talk to each other to find out that nobody had done backups. The one and only copy of the program and the database that a global automobile manufacturer depended on were stuck inside a corrupted MUMPS file in a VAX system on the third floor of an old building in a small town in Indiana. And now they were gone.

The manufacturer sent its workers home that day and assembly lines all over the world that needed those parts started to shut down. My phone rang, and soon after I found myself at O’Hare airport waiting for a flight. I didn’t even have a chance to grab a change of underwear.

Once I arrived, the plan was simple: Get the hardware fixed, figure out how the MUMPS file was corrupted in the first place, recover anything and everything possible, and get the company back on its feet -- fast.

What’s
Your Worst IT Nightmare?

Do you have
a story to share for our "Never Again" series? If
so, write it up in 300-800 words and e-mail it to Keith Ward
at kward@redmondmag.com.
Please use "Never Again" as the subject line. Include
your contact information so we can verify the story.

I put together some macro assembly language routines to read the raw disk blocks
in an effort to determine what sectors had good data and what was corrupt. The
development team quickly wrote some code to recover bits and pieces of the database
and application. After nearly two days of nonstop work, the development team
had recovered about 80 percent of what was lost. The plant started shipping
blindly, guessing how many units of each part each customer needed while the
database was down. The development team steadily recovered the rest of the database
over the next several days. To my knowledge, the electronics plant only sent
home one shift for one afternoon. No auto assembly lines shut down and the incident
never became public.

I was there the day the world economy almost crashed because of a poorly maintained
computer that nobody took seriously. I was witness to a heroic recovery effort.
And I learned how to wash underwear using a bar of soap in a hotel sink.