Disaster Recovery and Other Sacrifices

If you desire a subject that will invoke deep passion and
often combined with disgust from a group of DBA’s, disaster recovery is the
one. It is the subject that rarely we
feel our butts are not out there hanging, no matter how much we’ve attempted to
secure our environment.

I’ve observed a consistent flow of articles, conversation
and email discussions on the subject and it is apparent that rarely is the
business as aware as the technical specialists, (aka the DBA) of just how
vulnerable their environments are. Rarely
are the budget dollars allotted to the task of insuring that systems have the
proper disaster recovery hardware/software in place and/or testing performed in
a regular basis.

It’s easy for the business to see the value in the
production systems. They create revenue
and their value is equal to the dollars they produce. Development and test are more difficult for
them to understand, but most times, they can be justified the first time
production is undermined due to development or testing being performed in one
of them…:)

We now get into backup and recovery. How many backups are impacted by 24X7 shops,
where the only thing viewed by the business is impact to revenue by having to
allocate resources to backing up production and placing it on disk/tape that
offers them no value to that revenue.
Yes, the DBA’s and technical management argue, “All it takes is one loss
of production and you will be thankful that we have that backup…” Until that day comes, many business’ rely on
the robust nature of Oracle, the hardware it resides on and the technical
expertise of the folks they’ve hired to keep it running and never having to
rely on those backups. DBA’s commonly fight on a regular basis for time
to allocate to testing, hardware to test recoveries to and explaining to the
business why it’s important. The
business again looks on this as time that could be better allocated to creating
faster systems to create more revenue and again, impact to what the business is
there for- creating revenue.

The next level is then disaster recovery. All DBA’s know this is the final
gauntlet. We pray to the DBA Gods hoping
for a technical manager with the gift to motivate, sell and help the business
to understand why having standby’s of production databases to keep revenue
flowing in case of primary production going down is important. We are willing to sacrifice small animals in
the name of a secondary data center for disaster recovery testing. We want to know how long the business will be
down in case the unthinkable does happen and if all the documentation on what
it takes to create production will actually work when we do try to recreate it. We would also like to have that answer
demanded of us when the unthinkable does happen and upper management is sitting
in front of us asking, “So HOW long are we down for???”

This is not rare, this is not uncommon, it is all too often
the norm for most DBA’s in the business world.
IT Managers, Network Administrators, Database Administrators often
battle day in and day out, not just for what they need to provide the growing
demands of the business, but what the business needs to survive in case of
disaster. Our jobs are not just to
provide you with production, but to provide you with an ability to sustain your
business when the unthinkable happens.

The one thing I’ll add is that even when you get permission to run the backups, set up the standbys and devise the disaster recovery plan…Will the business let you test it?
After all, if you have not tested it, you might as well just wear a rabbit’s foot around your neck instead (and think how “lucky” that was for the rabbit).

I love that moment when, as you discuss why you need to test the recovery with the business manager, they ask if there is any risk to the live system (like, say, the backup software recovering the system to the actual live location). You say “potentially, yes” and they go pale. “We can’t afford that risk!” Well, oddly enough, as we discuss this lack of testing the thing that will save your skin in a disaster – your ass is hanging out there, at the risk of disaster. Now, will you let me test the recovery on that little dev system over there first and next week we can do Live?