Friday, May 15, 2015

Change Data Capture (CDC) and Simple Recovery Mode

One of the misconceptions that some people have about change data capture (CDC) in SQL Server is that it needs the database to be set up with full recovery mode. If we look at the following TechNet article, it mentioned:

It should be noted that change data capture works with all recovery models. But when change data capture is enabled, operations that would normally be minimally logged in simple or bulk-logged recovery models are fully logged to enable change data capture to capture all changes.

So Change Data Capture can be enabled on database with simple recovery mode. This can be helpful as typically in some development environment, the databases are being set with simple recovery mode. One thing to be careful is that if we have a database with simple recovery mode and have CDC enabled, but it doesn’t have the CDC capture job running, as the database process transactions, the database’s transaction log would use more space. This is because those transactions will not get release until the CDC capture process those transactions. Let me try to demonstrate this.

By default, when CDC is enabled, it would create 2 SQL Agent jobs, one is to capture the changes and another one is to clean up the CDC data older than 3 days (default, but it can be modify). For this testing, if you have SQL Agent running, you might want to stop and disabled the CDC capture job. You can do that using the following script:

The database's transaction log is now using about 116 MB of disk space, much higher than the 0.8 MB of disk space in the first test. If we look at the log_reuse_wait_desc column on sys.databases table for CdcTest database:

The reason that the database could not reuse the log space is because "replication", which actually make sense since under the hood CDC is using the same technology as replication. Basically, the database will not release the transactions in the transaction log until they are processed by the CDC capture job/process.

Now, let's try to process the CDC changes, do a checkpoint and check the database's transaction log space usage:

Additional thing to note is that, on the above example, we did those transactions on a table that is being tracked with CDC. This would not happen if the transactions are run against any table that is not being tracked by CDC on CDC enabled database. The database transaction log will reuse the log space after a checkpoint. The following script can help to illustrate this:

Clean Up

Take Away

Some take away points:

CDC can be enabled on database with any recovery models, including database with simple recovery mode.

Make sure that the CDC capture job is running, as transactions against CDC tracked table(s) will be preserved in the transaction log until they are processed by CDC capture job/process. This is especially true if you restore CDC enabled database with KEEP_CDC option, as the CDC capture and clean up jobs would not get created by default

If you have CDC tracked tables and you don't have the CDC capture job running, you can potentially see your database's transaction log grow even if it is in a simple recovery mode.

It is an interesting question. Obviously, CDC is available in Enterprise version of SQL Server (it is also available in Evaluation and Developer version of SQL Server). If we are looking to do log shipping and maintain CDC data to Standard Edition of secondary server, that would not work/ supported.

But how about if we are looking to just do the log shipping without the CDC data? Supposedly according to Microsoft documentation (https://msdn.microsoft.com/en-us/library/cc645938.aspx#RestoreOrAttach), if we are to restore the database without the keep_cdc option, the CDC would be disabled. Technically, we could restore database without CDC to Standard Edition of SQL Server.

But when I tested this out, unfortunately, it does not seems to work. I was able to restore the database with norecovery option just fine (I used Express Edition for testing). But whenever I tried to bring the database online using recovery option or standby option (to make it readable), I would get error message "SQL Server Cannot load database "xxx" because Change Data Capture is enabled. The currently installed edition of SQL Server does not support Change Data Capture. Either disable Change Data Capture in the database by using a supported edition of SQL Server, or upgrade the instance to one that supports Change Data Capture."

The only way that I can bring it online is if I disable the CDC on the primary server using sys.sp_cdc_disable_db, do another transaction log on the primary server and then apply that one to the secondary instance and bring the database online on the secondary server. But that basically you eliminate CDC on the primary server as well.