Warning: ob_start(): non-static method wpGoogleAnalytics::get_links() should not be called statically in /home/scolin/blog.scolin.com/wp-content/plugins/wp-google-analytics/wp-google-analytics.php on line 259Audit Your ETL With CHECKSUM « from the gut

Loading your target incrementally offers a huge performance benefit over running full truncate/reloads, but there is a danger of missing inserts or updates in your source system. It can take hours or days to track down the source of these problems (if it can be done at all!) and problems are generally not found until the customer picks up on it.

So anything you can do to verify the target data with the source is good for your business. You can do simple rowcounts, but that’s an unreliable test. For example, if you’re missing one inserted row and one deleted row, your rowcount test will still pass.

A good checksum technique is a lightweight alternative that can be run asyncronously or run after the load is complete. In this example, I am querying a SQL Server target and using ODBC to connect to an Oracle source to get a list of keys that are not syncronized.