Search This Blog

Tuesday, 12 February 2008

Change Data Capture (CDC) - I

Oracle 9i has this feature since 2003 as I can recollect, but being a SQL Server developer I am quite pleased its available in Katmai. To give a bit of background as to why CDC is a crucial feature , we will try to understand ETL in Data warehousing. A crucial function of data warehousing is extracting data on a regular basis applying some kind of transform if necessary to the data and storing it in the warehouse. So if i have a database replication setup I would run DTS packages which replicate and transform all data between the transactional database and the warehouse database. If you are a DBA or a developer involved in a data integration project that involves the regular extraction and transportation of a large amount of data from one system to another system or systems, you could consider CDC to help reduce your ETL time. The point is its more efficient than our previous replication methodologies for sure and it is particularly effective as we scale up on the database size. In SQL Server 2000, 2005 you would have written triggers to update data into another table where the metadata on the change is stored and then further right more plumbing code to put it all together ... I work on a product which has a similar concept.. Believe me it’s not a easy on 2000 and 2005. We where able to track data even though it was complicated but now in Katmai we will be able to capture the details of the event which caused data change on the relevant database table.

In Katmai, when we use Change Data Capture feature on a database table, a mirror of the tracked table schema is created with additional columns that include metadata. The metadata holds all the information about the change.

To know if CDC is enabled for a database, we could run the following query

SELECT * FROM sys.databases WHERE is_cdc_enabled = 1

Enabling the database for CDC

USE AdventureWorks

GO

EXEC sys.sp_cdc_enable_db_change_data_capture

GO

When you execute this command five tables in a schema called cdc are created. In addition to this there are two table valued functions added on to the cdc schema of the database. These functions behave like stubs for table based CDC functions

The five tables created are as below

cdc.captured_columns

cdc.ddl_history

cdc.index_columns

cdc.lsn_time_mapping

cdc.change_tables

In Katmai, when we use Change Data Capture feature on a database table, a mirror of the tracked table schema is created with additional columns that include metadata. The metadata holds all the information about the change.

In addition to this you will notice there are two Jobs one for Capture and one for cleanup added under the SQL Server Agent jobs for the database CDC functionality

CDC Schema

Enabling the table for CDC

On the AdventureWorks database to enable CDC on HumanResources.Employee table, we should use the following functions

USE Adventureworks

GO

EXEC sys.sp_cdc_enable_table_change_data_capture

@source_schema = 'HumanResources',

@source_name = 'Employee',

@role_name = 'cdc_AdventureWorks',

@capture_instance='HumanResources_Employee_CT'

GO

When we execute this command a new capture instance is created a table with this name of the capture instance is created with the schema as shown in the diagram

Note that columns as in the Employee table are created, In addition to which there are five extra columns which store metadata about the data.

Now we update a row on the HumanResources.Employee table as below

UPDATE HumanResources.Employee

SET LoginID = 'adventureworks\guy2'

WHERE employeeid = 1

Now select the data in the HumanResources_Employee_CT, we will find two rows inserted in this table, One representing the data before the update and the other after the data was inserted

Disabling CDC on a table/capture instance

USE Adventureworks

GO

EXEC sys.sp_cdc_disable_table_change_data_capture

@source_schema = 'HumanResources',

@source_name = 'Employee',

@capture_instance = 'HumanResources_Employee_CT'

GO

I guess I will pause here and continue on the rest in my next blog…. There are several cdc related system functions which could be used for different purposes, I will detail each of these in my next blog.. and i will also detail the schema a bit when i get my head around it hopefully