Inside Recovery Models

Being able to define a recovery model is one of my favorite features of SQL Server 2000, and it's one of the first new features I wrote about. When I wrote my June 2000 column, "Database Recovery Models," the product was still in beta, but I felt strongly that even early adopters needed to know about this capability before upgrading their SQL Servers.

Because the recovery model you choose is still important, and because I've learned a few new things over the past 3 years, I decided it was time to revisit this topic. First, I want to review the differences between the models and make sure that you periodically reevaluate which model is right for you. Second, I want to point out some new things I've learned and show you some tricks I used to verify the new information.

The primary decision a SQL Server administrator needs to make regarding recovery models is whether to choose Full or Bulk-Logged. The third choice, Simple Recovery, is appropriate only for some of the system databases and for test or training databases that you never need to back up and can rebuild easily when needed.

The trade-offs between the Full and Bulk-Logged Recovery models revolve around six special SQL Server operations: SELECT INTO, bulk copy program (bcp) and some Data Transformation Services (DTS) operations that use bcp, BULK INSERT, CREATE INDEX, WRITETEXT, and UPDATETEXT. If you never use any of these bulk operations, there's no difference between the Full and Bulk-Logged models, and it doesn't matter which one your database is set to. If you do perform any of these operations, the trade-offs involve the operation speed, the transaction log size, the size of the transaction log backup, and the precision to which you can restore the database. With the Full model, the transaction log grows larger and the operation runs slower when you execute any of these bulk operations. With the Bulk-Logged model, the log stays smaller, but when you back up the log, the backup contains all the changed data (not just the logged record), so you might need a lot more disk space. Finally, with the Bulk-Logged model, if you perform any bulk operations, you can't recover to a particular time within a log backup; you must apply the entire log during a restore.

Because the database recovery options in SQL Server 2000 allow enough logging for recovery to always be possible, your SELECT INTO and bulk copy scripts will never fail. You'll always be able to perform these bulk operations in any of the models. In addition, in the Full and Bulk-Logged models, you can make log backups no matter what operations you've done, and your backup scripts won't break. This column isn't the place for a full analysis of each model, but let's briefly review the main differences between them.

Choose Wisely

The Full Recovery model guarantees the least risk of losing any work in the case of a damaged data file. If a database is in this model, all operations will be fully logged, which means that SQL Server will write every row inserted through a bcp or BULK INSERT operation in its entirety to the transaction log. The transaction log contains every index row that a CREATE INDEX operation generates, and SQL Server writes to the log the full text or image field an application inserts or updates through WRITETEXT or UPDATETEXT.

In the Bulk-Logged model, when you execute a bulk operation, SQL Server logs the fact that the operation occurred and also information about which extents in the database files the operation affected. Bulk operations can happen much faster and the log is smaller than in the Full Recovery model. However, when you perform one of the bulk operations in this model, SQL Server uses a global bitmap with one bit corresponding to each extent in a file to keep track of which extents the operation affected. Each bit that has a value of 1 denotes an extent that a bulk operation affected since the last time the log was backed up. During log backups, SQL Server examines the bitmap and writes all the affected extents to the backup device along with the log records. The backup is much larger than in the Full model and usually takes longer. In addition, when you're restoring transactions from log backups made in the Bulk-Logged model after a bulk operation runs, you can't have any point-in-time recovery; you must restore the entire log backup.

The Simple Recovery model allows for fast bulk operations and the simplest backup and restore strategy. SQL Server can reuse sections of the log as soon as all the transactions they contain are committed or rolled back because they're no longer needed for recovery. Thus, only full database backups and differential backups are allowed in this model. You'll get an error if you try to back up the log in the Simple Recovery model.

The Latest Research

While I was researching for my book Inside SQL Server 2000, one of the SQL Server developers at Microsoft told me that the logging performed in the Simple model is identical to the logging performed in the Full model. That is, SQL Server logs every row of data inserted during SELECT INTO, bcp, and BULK INSERT operations and every index row generated during CREATE INDEX operations. I was told that the big difference between Full and Simple Recovery models is that in the Simple model, the log is constantly being recycled, analogous to using the truncate log on checkpoint option in previous releases of SQL Server. This new logging behavior is very different from the simplest logging in older versions, which never log CREATE INDEX operations; SQL Server records only the fact that it has built an index. Older versions also offer the option SELECT INTO/BULKCOPY. With this option enabled, SQL Server can perform SELECT INTO and fast bulkcopy operations but records only allocation information and the fact that it carried out the operation.

The simple tests that I ran seemed to verify my understanding that the Full and Simple models performed similar logging. However, I was really just verifying that a lot of activity was happening in the log during a bulk operation even in the Simple model. Because my tests showed a lot of activity, I assumed that they proved my theory.

To look at log rows, I used the undocumented table-valued function fn_dblog(). I included the function as part of Listing 1's script, which shows the type of test I ran to prove my assumption. I won't explain exactly what this function returns; all my script does is display a row count that represents the number of log records. In the beta version of SQL Server 2000, where I originally ran my tests, the number of rows in the logs in the Simple and Full Recovery models was almost the same. In SQL Server 2000 Service Pack 3 (SP3), the log has more rows in the Full Recovery model, but the Simple model still has more rows than I expected. However, my original tests didn't compare the number of log records generated in the Simple model with the number of log records generated in the Bulk-Logged Recovery model—I completely ignored the log size in the Bulk-Logged model. I realized this last point when SQL Server MVP Dan Guzman sent me a script that he used to determine logging behavior.

Guzman's script didn't use any undocumented commands; it merely created the database MyDatabase with a small, fixed-size log, then ran the same SELECT INTO query in all three recovery models. Only in the Full model did SQL Server generate an error saying that the log had filled; neither the Simple nor the Bulk-Logged model generated that error. Listing 2 shows Guzman's tests, which are almost identical to those in Listing 1, except that Guzman's database log file has a fixed maximum size instead of using the default unlimited size as the script in Listing 1 allows. Guzman's script also tests all three models, not just the two models that I tested. It seems pretty clear from the results of Guzman's scripts that the Full model is logging a lot more information than the Bulk-Logged and Simple models are.

You can run more tests of your own. You can add a third test to Listing 1 that includes the Bulk-Logged model. Besides looking at the number of rows in the log after each SELECT INTO operation, you can use the following command to look at the log size:

DBCC sqlperf(logspace)

Although you can now ascertain that the Simple model doesn't log as much information as the Full model, that doesn't mean you should be complacent about log size if you're running a database in the Simple model. The log grows proportionally to the size of the transactions you're running, and log cleaning won't happen automatically until the oldest open transaction completes. So if you have a very large or long-running transaction that overlaps hundreds or thousands of other transactions, the size of your transaction log can still become a problem.

Another lesson to be garnered is to not be complacent about things you think you know, especially when you notice behavior that doesn't seem to support your understanding or when you hear from people whose opinions you respect that things aren't the way they seem. Running some quick tests when doubts start creeping in can not only clear up many misunderstandings, it can also give you additional practice in investigating SQL Server behaviors. And the next time you have a question about how SQL Server works, you can figure it out for yourself.