Great Links

If T-SQL Merge can only handle one changed record at a time, how can we use it for our initial/historical load? Further, how can we use Merge in those situations where the incremental load may produce more than 1 changed record per entity? Well, I’m glad you asked.

Welcome to the next article in the Urge to Merge Series. In this series I’ve tried to clear up many misunderstandings about how to use T-SQL Merge effectively, with a focus on Data Warehousing.

The most important part of this query is the RankNo column. Notice that I am assigning a Row_Number per entity (Business Key) by descending ChangeDate. This means the oldest records per CustomerNum get the highest RankNo value. My looping Merge is set to loop from highest rank down to 1, which may seem counter-intuitive to some. The reason I do this will become clear in a minute. Anyway, the above query will create a data set like the following in Staging.DimCustomer:

Notice that I’ve filtered for Yoda only for simplicity.

If you’ve been following along in this series, I’ve repeatedly mentioned a limitation of T-SQL Merge. On each execution of the Merge statement, there will only be 1 record per entity to Merge. Well, we’re about to deal with that issue now. Just because we can only Merge one change record per entity at a time, doesn’t mean we can’t loop through Merge statements to accomplish an Initial (Historical) Dimension Load. The Merge code should look familiar as it builds on my previous articles.

As I mentioned in a previous article, the Type 1 Merge statement is performed last. Obviously, it has to be.

Notice that I’m using a simple While statement to loop.

The RankNo column was calculated by descending ChangeDate because this insures that the last record for each entity to be processed has a RankNo equal to 1. This makes the Type 1 Merge statement easier to write.

What about Transactions and Error Handling?

I’ve intentionally left those aspects out of this code for a couple of reasons. First, it’s easier to understand what’s happening without the extra code in the way. Second, you may not actually want explicit transactions in your Initial Data Warehouse load. It doesn’t take a Rocket Surgeon to figure out that if you load a huge Dimension historically, and you only use one transaction, your transaction log may grow out of control. This is particularly true if you are loading multiple Dimensions simultaneously. This is definitely an “It Depends” situation.

As I mentioned in the previous article, Extracting Historical Dimension Records using T-SQL, the Merge statement above should work in either Method 1 or 2. Also, because of the RankNo column, incremental Dimension loads which may contain more than 1 changed record per entity (CustomerNum) will work correctly as well.

This might a bit off topic but I was wondering how you would handle inferred dimensions in both a historical and normal dimension load? Typically this is done in fact load when the dimension look-up fails and the dimension is created based on the data in the fact table but here you are advocating use merge for both dimension and fact load. If this is something you’ll cover in future blog posting for fact loads I’ll patiently wait. Love the merge series.