SQL Server Backups: When More is Less

For most reluctant-DBAs there’s a certain degree of mysticism or ‘unknown’ about the SQL Server backups that they get ‘roped’ into managing. Moreover, for the uninitiated, there are a number of things about how SQL Server works that can seem counter-intuitive. Such as auto-closing and auto-shrinking databases – things that may seem logical to Systems Admins who are tasked with ‘optimizing’ resources but which, in reality, are easily some of the worst things you can do to your SQL Server databases.

SQL Server and Disaster Recovery

Just as counter-intuitively: while the data in a SQL Server database may be all sorts of mission critical, more backups of that data may actually JEOPORDIZE your ability to recover a database after a disaster – because MORE backups of SQL Server databases may, in fact, result in less coverage and protection.

For example, one thing that I see fairly regularly is that reluctant DBAs who are handed ownership of SQL Server databases typically just don’t get the kinds of training or background needed to help them demystify SQL Server Backups.

So, what commonly happens is they’ll spend a bit of time with their ‘new responsibility’ and commonly use SQL Server Management Studio to create a Maintenance Plan that (among other things) executes FULL backups at let’s say… 1AM every night.

Then, what commonly happens is plenty of time passes with no problems, incidents, or issues. And the sense of urgency to figure out how these backups work gets pushed into the background – by more immediate problems and day-to-day needs.

Then, commonly, these same reluctant DBAs commonly make a critical mistake a bit later on which is while reviewing their normal system-level backups for some Well-Known-3rdParty-Backup-Solution(TM) they decide “Hey, this 3rd party solution that has saved my bacon with file-server or AD or Exchange backups in the past actually supports native SQL Server backups; I’m going to go ahead and provide EXTRA coverage and protection for my SQL Server by having this tool back it up every morning at oh… 5AM – just in case – and as an EXTRA bit of coverage in case my normal SQL Server backups don’t work.”

When More is Actually Less

Additional time passes without incident or issue – and it’s easy for reluctant DBAs to forget they’ve actually got MULTIPLE tools creating FULL backups every day. And since they don’t realize that this is a disaster waiting to happen, it’s only natural that they eventually forget about their ‘more is less’ problem.

And while this may not seem like a huge problem because the log-chain gets restarted by that 5AM trusted-3rdParty-Tool’s backup, the reality is that a reluctant DBA dealing with a full-blown disaster might EASILY forget that their 5AM ‘extra’ backup is even occurring at all. Because, in their mind, it’s a ‘bonus’ backup or contingency that provides them with ‘extra’ coverage – NOT something that is actually, actively and regularly ‘breaking’ their backup chain each and every day.

Worst case scenario: After trying, multiple times, to unsuccessfully apply transaction logs after 5AM, they may just give up and tell end users “Sorry, there’s something wrong with the backups, and I can only recover the database to what it was like at about 4AM this morning – all work since that point is lost.”

Best case scenario: Even if a reluctant DBA in this position is able to remember their 5AM (third-party) backups, they’ll typically only do it after banging their heads a few times into error messages when trying to apply log files taken after 5AM. (And, to the uninitiated, these error messages have to seem pretty cryptic. I’ve been doing this for years now and commonly feel that I have to take off my socks and count-out on my toes what these errors are saying in order to make heads or tails of them when I break the log-chain order myself.) Accordingly, even in a best-case scenario it’s incredibly likely that DBAs have spent too much time figuring this problem out at the wrong place and time – thus dealing a serious blow to overall recovery time.

Preaching to the Choir

The biggest problem, here, of course, is that I’m preaching to the choir. All DBAs with a fundamental understanding of how SQL Server backups work know NOT to even RISK having multiple processes running around making FULL backups because they don’t want to address the additional complexity of having to mentally juggle multiple, potential, log backup chains.

Therefore, best practices for backups are to avoid ‘extra’ backups in the mix (which is the entire reason why the COPY_ONLY backup was provided) – for fear of ‘reducing’ backup coverage or protection (or for fear of CONFUSING coverage to the point where you encounter more down time or risk decreased coverage).

All of that said, it’s one thing for DBAs to KNOW these details. It’s another thing entirely to know whether or not you might be susceptible to this kind of problem. Because another version of this ‘More is Less’ problem is where savvy DBAs understand this problem implicitly and don’t PERSONALLY implement dual backups – but where they may have missed that someone in IT/Operations decided to recently point a 3rdParty backup solution at their databases as a means of providing ‘extra’ coverage in a well-meaning gesture.

As such, the only way to know if your backups are ‘safe’ from the ‘More is Less’ problem is to regularly test and validate your backups by means of regularly recovering them. Something I regularly ‘harp upon’ whenever I talk about backups – and which I blogged about in my post on Dilbert, Billiards, the US Navy, and SQL Server Backups.