When doing a database restore, what actions is the engine taking after it gets past the 100%?

This was a question my admired Greg Low posted on the SQL Server MCMs discussion group a few days back, while he was restoring a 2TB database and noticed that the percent_complete column from sys.dm_exec_requests for the session from which the RESTORE statement was running, reached 100% and thirty minutes later, the restore hadn’t finished executing and was still indicating it was 100% completed.

To investigate what was occurring and be able to explain it to others, I created a very simple repro to diagnose what was not included in what the restore task estimates to be the 100% of work it has to complete and how the progress gets reported. The test scenario consisted of the following steps:

1) From one session in SSMS (or any other SQL client tool for that matter), I execute the following script and leave the transaction open for now…

create database Greg

go

use Greg

go

create table t1 (c1 char(8000))

go

set nocount on

go

insert into t1 values (REPLICATE('X', 8000))

go 1000

go

begin tran

insert into t1 values (REPLICATE('X', 8000))

go 1000

delete t1

go

set nocount off

go

--use master

--go

--rollback

2) From a second session, I execute the BACKUP DATABASE statement like this:

backup database Greg to disk = 'c:\bak\Greg.bak' with format

3) Once the backup completes, I rollback the transaction which I left open in the first session and then restore the database on top of itself, from the same session from which I dumped the database:

use master

go

restore database Greg from disk = 'c:\bak\Greg.bak' with replace

Now, having trace flags 3605 & 3004 globally enabled before I execute the RESTORE statement, this is what I see in the ERRORLOG during the execution of the command. Following information, I copied from my ERROLOG and annotated (and highlighted) those points which are critical to understand it:

2014-06-03 22:03:34.660 spid51 Restore: Transferring data to Greg <-- Just at this point is where the amount of work to complete is estimated. So, up to this point, any query on sys.dm_exec_requests for session_id = 51, would show percent_complete as 0 (zero percent.) From this point, the copy data machinery reads data from the pipeline and processes that data. That data initially comes from the MTF_MSDA descriptor block which encapsulates the data file stream in the backupset.)

2014-06-03 22:03:34.890 spid51 Restore: LogZero complete <-- In my example, at this point 42.40605% is completed. Having consumed the data stream and zero initialized the log file, it processes the data read from the MTF_MSTL descriptor block which encapsulates the transaction log stream in the backupset.

2014-06-03 22:03:35.290 spid51 SetTargetRestoreAge: 0 <-- At this point, it is already 100% completed. So it took 1,020ms to get to a hundred percent completion from, roughly, the beginning of the RESTORE statement execution.

2014-06-03 22:03:38.930 spid51 RestoreDatabase: Finished <-- At this point, 3,640ms have elapsed since the completed percent was reporting 100%. And it only took 1,020ms to get from 0% to 100%. That is, in this super simple example over 75% of the total time elapsed, the progress tracking was reporting it had completed already.

Mileage would vary on every case, but now you can find for yourself from the output reported to ERRORLOG by RESTORE statements, when those two trace flags are enabled, those two milestones where 1) the total amount of work to complete is estimated (Restore: Transferring data to <Your_Database>), and 2) the point at which the percentage of completed work has already reached 100% (SetTargetRestoreAge: #)

With that information, you can typically figure it out by yourself, where exactly is it that the remaining time to bring the database fully restored is being spent.

You may happen to be as curious as David Barbarin is, and consider the previous response partially complete unless you are explained what activities are included in the PostRestoreContainerFixups and PostRestoreReplicationFixup operations… :-)

If that’s the case, I can tell you that PostRestoreContainerFixups resolves the post-restore configuration of sysaltfiles/sysdbfiles, while PostRestoreReplicationFixup invokes the sp (dbo.sp_restoredbreplication) which fixes up replication status as appropriate post-restore, and the sp (dbo.sp_cdc_restoredb) which cleanses up the CDC related metadata.

You may also wonder which, from all the phases logged there, do not occur when the database is restored using the WITH NO_RECOVERY clause. And the following screenshot tries to address just that case.

Hope it’s clear and helps you understand that, at least in the case of RESTOREs, the value expressed through the percent_complete column is just based on a rough estimation from a very specific phase of the restore process which is the copy data phase, during which data file and transaction log streams are read from the backupset and processed (ie, laid down over the storage specified.)

I have meet a few of problems like this, a table have a billion rows without patition, one index on this table casue so many dirty pages in buffer pool by inserting, whatever I do backup log or restore database or the other operation like 'ALTER DATABASE', it take a long time.