Wednesday, January 23. 2008

For all but the extremely lucky in the IT world, there comes a point when you realise that your backup procedures are somewhat inadequate. Backups are often not taken into serious consideration by management, and with the modern more-with-less attitude to IT departments, nobody really has the time to maintain backup procedures.

I had my wake-up call yesterday, mid-afternoon, when we suffered a series of unexpected and unlikely failures that ultimately combined to cause serious file-system corruption on the disk that houses an important database's master database file (.MDF). With the database gone, but the previous night's backup and the transaction logs (.LDF) intact, recovering the database to the point of the incident wasn't particularly easy for someone who's not a DBA.

My personal wake-up call happened to me in 1995, when a Huntsman spider decided it would be a good idea to crawl inside my workstation, probably trying to find a nice warm place to rest. When discovering my computer had died, I found its carcass trapped underneath the motherboard where it had managed to conduct enough electricity to fry the motherboard, the CPU, graphics card, and most importantly my 260MB Western Digital Caviar hard drive which contained all of my data. Needless to say, I started instituting personal backups from then on.

Of the many corporate environments I've seen, backups are all too neglected, seen with casual regard by management as an expensive system that gives little or no benefit to daily business. Management types like to see systems doing something, otherwise it doesn't seem to justify the cost. When the day comes that a restoration of data from a backup is required, management then asks why proper backups weren't in place.

All too often a disaster strikes that prompts an upgrade of the backup facilities, but soon after when there's too much data to be backed up, corners are cut that may seem wise but can have dangerous repercussions.

Often misunderstood is the transactional nature of SQL server. Transactions are not only your friend from an application side, but also from a data recovery standpoint. The transaction log (.LDF) is often misunderstood and considered fairly worthless next to the master database file (.MDF). This is understandable, since the master database file contains the database's actual data, and the transaction log essentially contains a list of all actions performed against the database.

The transaction log in fact provides the ability to not only reverse changes to the database that were recently made (roll-back), but recover the database when an older version of the master database file exists but newer transactions have been entered since that point and exist within the transaction log (roll-forward).

So important is this file that the transaction log will continue to grow until it is securely backed-up completely, at which point older transactions can then be “released” from the log. Of course, these database operations will still be migrated into the master database file, but the system retains them to permit a roll-forward should disaster strike.

Due to storage space constraints, the original backup policy for our database was a nightly backup of the database file, followed by its transaction log. Before my time at the company, the transaction log backup was only performed to stop the transaction log from growing!

The problem that we experienced was a disk failure that destroyed our master database file, but left the transaction log file intact. Given that this was after half a day's worth of business, and the most recent backup was from the previous night, the obvious solution was to recover the backup and roll forward the transactions from the remain transaction log file.

The problem with this is that SQL server only permits you to recover operations from database files and transaction logs that have been backed-up (.BAK and .TRN respectively), and not the live files themselves. After much googling and discussion, it was discovered that this was an impossible task; Impossible, at least, without an external tool.

Many forums described the use of a product called ApexSQLLog, which has the ability to open a live transaction log (.LDF) file and generate a replay of events as an .SQL file we could import into the query analyser.

Unfortunately the product, despite its claims and its price tag, turned out to be extremely buggy, being incapable of operating on a 12GB transaction log file and even incapable of filtering based on operation timestamps. Their support team was fantastic, providing patches of the software to try, until it hit the end of the business day, at which point they all disappeared and became unreachable.

ApexSQL Log should have helped, and while their support team convinced me that this was the best way to recover our database, I gave up around 4:30am (the day after the recovery) and started working on my own theories.

“If the database was online,” I thought, “then I would be able to back up the transaction log file and restore the database”. After much trial and error, and MSDN forums telling me it was impossible to do this, I succeeded in restoring the database, losing around ten minutes of uncommitted transactions.

If this has happened to you, you have SQL Server 2005, and your database is still listed in the SQL Server Management Studio, be it online or offline, I cannot stress this more: Try to perform a backup of your transaction logs immediately using the NO_TRUNCATE option! Don't hesitate, just do it, it will save you many hours of stress. If it's not still listed but you have a recent backup, then follow my steps below:

Recover your database from the last backup you have (.BAK and .TRN). Since you've lost your database, restore the last backup you have into the management studio. We will be using this to trick the system slightly.

Take this newly restored database offline, if you did not elect to do this during the restoration process.

With the database offline, replace the actual transaction log file (.LDF) from the database you restored in step 1 with the most recent copy of the transaction log file that survived the disaster. The server will no longer allow you to bring this database online as the files will not match each other precisely.

Execute the following SQL code, changing dbname with your database's name, and filename.trn with the location where you would like to backup the most recent transaction logs, but make sure you do not overwrite the transaction log backup from your most recent backup set!

BACKUP LOG dbname TO DISK="filename.trn"WITH NO_TRUNCATE

The use of the NO_TRUNCATE flag is important, as it will prohibit the SQL server from attempting to modify the master database file.

Remove database this database by deleting it within the SQL Server Management Studio.

Recover the original database from the master database file backup (.BAK), using the RESTORE WITH NORECOVERY option. This option allows you to recover other files.

Continue with the restoration by restoring the original transaction log file backup (.TRN) from your recent backup set, again making sure you use the RESTORE WITH NORECOVERY option.

Finally, recover the .TRN file you created in step 4, and restore this with recovery so that the database becomes operational again.

If everything went well, your database should be recovered up until the moment of the disaster, with the exception of any transactions that were not completed and committed to the database.

Our new backup policy for this database includes a full backup of the master database and transaction logs every day, and a complete backup of the transaction logs every thirty minutes. Should we have another disaster like this, we will only loose at most thirty minutes of work without using the trick above.

It is now very much burnt into my head that the first action with any database recovery is to attempt a full backup of the database, no matter what state it's in, and failing that attempting a backup of the transaction logs with the NO_TRUNCATE flag.

This article may not be ground-breaking, but since I found it impossible to find information on this sort of recovery, I hope this helps someone out there suffering from the same problem that kept me working all night at the office.

I need to thank the Finance Department for their appreciation of my overnight work, as they presented me with a “Chocotelegram” and a “Message on a Bottle” (of red wine), both of which came as a complete surprise! Thanks guys!

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.