In Recovery...

This is a post I’ve been meaning to do for a while – detailing some of the problems you can run into when using database snapshots.

Reverting to a snapshot

Many people use snapshots as a way of protecting against mistakes during a complicated set of changes to a database – you can just revert to the snapshot if something goes wrong. However, reverting to a snapshot has a little-known (but documented) problem. Let’s see. I’m going to create a database plus a snapshot of it and then start taking backups.

Msg 4214, Level 16, State 1, Line 1
BACKUP LOG cannot be performed because there is no current database backup.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.

Reverting to a snapshot rebuilds the transaction log and breaks the log backup chain. The only thing I can do is take a full or differential backup and then start taking log backups again. So – I lost the ability to do point-in-time restores in the period from the last log backup to the time when I reverted to the snapshot.

So how can I tell when the database was reverted? Books Online documents that the restorehistory table in msdb should have an entry with restore_type = ‘R’. Let’s try:

Nope – nothing. That functionality seems to be broken. The only way I could find to trace reverting was in the error log:

2008-01-30 11:09:21.73 spid53 Reverting database 'SnapshotTest' to the point in time of database snapshot 'ST_Snap' with split point LSN 26000000013800001 (0x0000001a:0000008a:0001). This is an informational message only. No user action is required.
2008-01-30 11:09:21.74 spid53 Starting up database 'SnapshotTest'.
2008-01-30 11:09:21.74 spid53 The database 'SnapshotTest' is marked RESTORING and is in a state that does not allow recovery to be run.
2008-01-30 11:09:21.85 spid53 Starting up database 'SnapshotTest'.
2008-01-30 11:09:21.87 spid53 Starting up database 'ST_Snap'.

Note that the snapshot database is still there and continues to work.

Running out of space in a snapshot

The next problem that can happen is if you create a database snapshot on a volume that doesn’t have much disk space. In that case you can actually run out of space and the snapshot will go suspect and be unusable until it’s dropped and recreated. Let’s see what happens:

-- Create snapshot of large database on a volume with not much space
SELECT * FROM [SalesDB].[sys].[database_files];
GO
CREATE DATABASE [SalesDB_Snap] ON
(NAME = N'SalesDBData', FILENAME = N'D:\sqlskills\SalesDB_snap.snp')
AS SNAPSHOT OF [SalesDB];
GO
ALTER INDEX ALL ON [SalesDB].[dbo].[Sales] REBUILD;
GO

Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'D:\sqlskills\SalesDB_snap.snp' failed due to lack of disk space.

The query that causes the sparse file to run out of space doesn’t fail, but the user running the query will see this error. Note that it doesn’t say the snapshot has gone suspect. If you try to use it then it will tell you, plus there info written to the error log:

One thing that isn’t common knowledge is that DBCC CHECKDB uses database snapshots as it’s mechanism for running online in SQL Server 2005. You can’t control where the snapshot is created (it’s actually created in the same location as the files comprising the database being checked – see my previous post on the DBCC CHECKDB stages here for more info) but you can create your own snapshot and check that. Anyway – it’s possible to run out of space in the snapshot while DBCC CHECKDB is running. I setup a situation with a large database on a drive with only 1MB of space left, started a large index rebuild in the database and then tried to run DBCC CHECKDB. Let’s see what happened:

The index rebuild statement:

ALTER INDEX ALL ON [SalesDB2].[dbo].[Sales] REBUILD;
GO

Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'd:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20' failed due to lack of disk space.

So some poor unsuspecting user gets this error and has no idea why. For DBAs its a bit obscure too. The syntax is describing an alternate stream on the existing data file and the snapshot in question can’t be accessed at all.

The DBCC CHECKDB statement:

DBCC CHECKDB (N'salesdb2') WITH NO_INFOMSGS;
GO

Msg 1823, Level 16, State 1, Line 1
A database snapshot cannot be created because it failed to start.
Msg 1823, Level 16, State 2, Line 1
A database snapshot cannot be created because it failed to start.
Msg 7928, Level 16, State 1, Line 1
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 5128, Level 17, State 2, Line 1
Write to sparse file 'd:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20' failed due to lack of disk space.
Msg 3313, Level 21, State 2, Line 1
During redoing of a logged operation in database 'salesdb2', an error occurred at log record ID (1628:252:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

Wow – major amounts of errors – and it all boils down to running out of space.

7 Responses to Database snapshots – when things go wrong

Nice info. I have posted something similar about reverting to a snapshot and what happens with the transaction log (blogpost "Transaction log files after reverting from snapshot"). I suspect however that my scenario is not documented and can lead to other undesired situations.

Well, nice thing. That’s… impressive. By the way about the situation when you run out of space. Recently I ran a report to get some information about disk quota on my storage backup center server and realized that I am in big problems. I have no space left on my server! By the way, it’s possible to request quota information using Win32_QuotaSetting class but in fact I get such reports automatically using built-in reporting functionality in my Desktop Authority management tool from Scriptlogic. Okay but I am talking about SQL so let’s get further although we’ll address Scriptlogic once more a bit later. So I just remembered the ole’ days when I’ve been using Stacker and all the stuff to compress the drive. I use NTFS compression for small databases that size usually falls into the range of 1 to 8 gigabytes but I avoid using it with larger databases due to a well-known problem with delayed writes errors that occur to a large file compressed with NTFS because of err… fragmentation. Okay so I was sitting there in the room beating my brains about what I can do while I decided to browse Scriptlogic’s site to see if they had something for managing SQL. They had to have something because Desktop Authority’s config and reps are running on SQL. I didn’t know if they offer something because the offer a very broad set of server and desktop management products – what is quite understandable for the leader – so that I usually don’t dive into the product page. I know if I find a tool there – and I know I will – I will be wanting to purchase it and my boss is very stingy. And what do you know! They had a tool called Litespeed allowing to backup database and compress them simultaneously without you needing to use some additional compression tools. Well, it took me a second to grab it from their site and ten minutes later I was sitting thinking about which level of compression would be better to choose for my 95 gig DB. Finally I stopped my guess by selecting something from the middle from the available range of compression settings. I think I’ve chosen 5 or so, I don’t remember as now I basically use 8 or 10 when defining compression as I know it will boost compression ratio and at the same time it will not affect my performance. Well, I set it to compress the backup and in a few hours I had my backup on the remote drive compared to that time of about a day I expected it to complete with backing that 95 GB database though my slow 100 Mbits connection. Unfortunally, when you work with 100 Mbits in a working environment loaded with may other network-bound operations your 100 Mbits may effectively become a 10 Mbit connection. Add here the low effectiveness of Ethernet protocol . Should I say I was really surprised to see my job had been completed nightly when I expected it to be still working when I returned back to my office the other day. I had my database archive stored on a remote server and it saved me around 75% of space from the original database.
Yours,
Carl.

We tried several ways to resolve this issue like DBCC commands, Offline and Online but did not resolve the issue. Finally we detached the database after taking tail log backup and attached back with new log file. After that everything looks Good

Hi,
I need one help. I am going to reduce downtime during our DBChange Installation. Currently we are use to take a diff backup followed by a transaction log backup and then applying the DB changed (DDl/DML). Its a replicated (Transaction replication) OLTP environment.
Can I use DB snapshot instead of using backup! What are the other things that I need to consider before going to implement this framework in our product.
Regards
Abhijit Dey