Detecting Changes to a Table

Introduction

It's sometimes useful to be able to detect which tables have changed (any insert, update, or deletion) since a particular point in time. For example, a data export job to an external system may need full table data each time (rather than incremental change records), but it is useful to only send through data for tables that have changed since the last export.

Traditionally, requirements such as these can be met using triggers that populate a meta-data table, or a timestamp column that is compared to the same. Code Listing 1 shows an example approach using triggers.

While reading BOL recently, I came across the SQL Server CHECKSUM(), BINARY_CHECKSUM(), and CHECKSUM_AGG() built in functions, the last of which is suggested specifically as a method to "detect changes in a table". However, experimenting further with the CHECKSUM_AGG() function has convinced me that it has a number of significant flaws that render it unsuitable for this kind of application.

This code was tested on SQL 2005, version 9.00.4035.00 (SP3), and also on SQL 2008 R2, version 10.50.1753.0 (RTM with cu5).

Using CheckSum() and CheckSum_Agg() Functions

Let's look at using the CHECKSUM_AGG() function, mentioned in Books OnLine and in several websites, touted as a useful way to determine if data in a table has changed. This could be a nice self-contained approach that does not require the use of Triggers, although we may hesitate in very large tables due to its full table scanning nature.

We still need our meta-data table to track which tables have changed, but this time, rather than using LastUpdate, we create a LastChkSum field, which stores the value of the output of the CHECKSUM_AGG(BINARY_CHECKSUM(*)) functions, which generates a single checksum value across the entire table.

As you can see, the approach works well for single changes, but comes apart for a symmetric change. (If you are unsure of what I mean by a symmetric change, please see the addendum below where this is defined.)

I find this very surprising. If calculated checkum and binary checkum fields are added to the CheckSumTest table to see what is going on under the hood (see Code Listing 3 below) and the tests are re-run, we can see that different checksum values are generated for every row changed, as we would expect, but the overall table wide CHECKSUM_AGG() is still the same in the case of a symmetric change(!)

Conclusion

Although CHECKSUM_AGG() is described in BOL as "can be used to detect changes in a table". It clearly falls-down when symmetric changes are applied As a disclaimer, BOL does go on to state: "If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change."

If there is very little chance that you will get symmetric changes in your data, then this statement probably holds true, but for me, the fact that the algorithm is short of water-tight is a deal-breaker. I would have preferred to have seen the specific scenario of symmetrc changes mentioned in the documentation as a short-coming of the CHECKSUM_AGG() function.

Adendum : Definition of Symmetric Change

A symmetric change is two changes that mirror each other, commonly used to swap information from one record to another and vice-versa. For example, I have two rows. The first is ID=1, and name "John Smith". The second, ID=2, and name="Jane Doe". If I run the following two updates to swap these records, I consider this to be a "symmetric" change.

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands
of articles and SQL scripts, a library of free eBooks, a weekly database news roundup,
a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals
that makes it such a success.