Type 2 Slowly Changing Dimensions are used in the Data Warehouses for tracking changes to the data by preserving historical values. This is achieved by creating a new record in the dimension whenever a value in the set of key columns is modified and maintaining start and end date for the records. The latest records are either identified by querying for records that are not end dated or by maintaining a flag (eg. current_flg) to easily identify them.

Testing SCD Type 2 Dimensions

Testing SCD Type 2 Dimensions is tricky because it cannot be achieved by a simple comparison of the source and target data. In this article we will examine different aspects of Type 2 SCD that can be tested using ETL Validator.

For the sake of this article, lets consider an Employee dimension (EMPLOYEE_D) of SCD Type 2 which is sourced from a table called EMPLOYEE in the source system.

Test 2: Verifying the uniqueness of the key columns in the SCD

The combination of the key columns in the SCD should be Unique. For the above example, the columns EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY comprise of an unique key in the EMPLOYEE_DIM dimension. This can be easily verified using the Duplicate Check Rule in the Data Rules test plan of ETL Validator. The query generated by ETL Validator using the Duplicate Check Rule should be something like below:

Test 3: Verifying that historical data is preserved and new records are getting created

Whenever there is a change to the values in the key columns a new record should be inserted in the EMPLOYEE_DIM and the old record should be end dated. ETL Validator's Component Test Case can be used to verify this functionality. The Component test case has a feature of Baseline and Compare which can be used to identify the changes in the EMPLOYEE_DIM.

Below are the steps :

Create a Component test case and take a snapshot of the current values in the EMPLOYEE_DIM (called Baseline)

Modify a few records in the source EMPLOYEE table by updating the values in the key columns such as SALARY, LAST_NAME

Execute the ETL process so the the EMPLOYEE_DIM has the latest data

Run the Component test case to compare the Baseline data with the Result table and identify the differences. Verify that the differences are as expected.