Tag Archives: SCD2

In my previous post I wrote about how to use PowerPivot on top of a relational database that is modeled as a star-schema with slowly changing dimension type 2 (SCD2) historization. In this post I will advance this approach by adding other tables tables that are not related to our facts but have a relationship to our current dimension tables based on an other attribute.

I will take the example from my previous post and extend it by a new table called "WarehouseOwner". This table looks as follows:

WarehouseOwner

Warehouse Owner_SID

Warehouse Owner_BK

Manager

VALID_FROM

VALID_TO

1

John

Peter

2012-01-01

2012-01-03

2

Dave

Peter

2012-01-01

3

John

Dan

2012-01-03

This table holds information about the warehouse owner and its manager. In this example the manager of John changed from Peter to Dan with the 3rd of January. Remember that we have a WarehouseOwner-Column also in our Warehouse-table.

Logically this new table can be related to our Warehouse-table but as PowerPivot only supports 1:n relationships this is natively not supported and we have to use DAX to simulate this relationship. Most of the steps are similar to the steps that we have already done for our other SCD2-tables. First of all we create a ‘IsValid’-measure to identify rows that are valid for the selected referencedate:

The next step would be to mimic a relationship between our Inventory-table and our WarehouseOwner-table. But opposed to our old star-schema where each table was directly related to the Inventory-table, this is not possible with our new snowflake-schema anymore. So we have to relate our WarehouseOwner-table with our facts using the Warehouse-table. Without any relationship the result would look as follows:

This result is similar to a crossjoin of our Managers and our Warehouse-Owners regardless of time and validity. But we need to analyze which Manager was responsible for which warehouse owners and respectively their warehouse at any given point in time. To relate our tables we have to identify the combinations that are valid for the selected referencedate. If you reconsider these requirements, you will realize that this is similar to the relationships that we had to define to link our other tables to the inventory-table. Again we can use CONTAINTS()-function in combination with FILTER() and a IsValid-Measure but in this case we use them in our Warehouse-table to extend our Warehouse_IsValid-Measure:

Using our new Warehouse_IsValid-measure in a PivotTable reveals which combinations are valid for a given time:

Actually this is already all we have to do here to get the desired result, we do not have to touch any other formula in our model! By replacing our Warehouse_IsValid-Measure with our previous Amount-Measure we can see the final result:

When Dan becomes the Manager of John with the 3rd of January also the facts that are related to warehouses belonging to John are associated with Dan. As with the 4th of January John also becomes the owner of WH2 (now owning both/all warehouses) Dave and also its manager Peter do not have any facts related to them anymore.

As you can see this approach can be very easily extended by new tables even if the they are modeled in a snowflake-schema. All you have to do is to modify the IsValid-Measure of the main-table that relates to the new table.

Stay tuned for the next post where I will analyze the performance of this solution with large amounts of data!

Having worked a lot with Analysis Services Multidimensional Model in the past it has always been a pain when building models on facts and dimensions that are only valid for a given time-range e.g. following a Slowly Changing Dimension Type 2 historization. Opposed to relational engines, Analysis Services Multidimensional Model does not support complex relationships that would create more than one matching row like in terms of SCD2 a join using T-SQL’s BETWEEN. In Multidimensional Models each fact-row has to be linked to exactly one member in a dimension. There are some techniques to mimic BETWEEN-Joins like abusing ManyToMany-Relationships but all of them have several drawbacks and are just workarounds that do not solve the actual problem.

Fortunately PowerPivot (and in the future Analysis Services Tabular Models that are part of SQL Server 2012) offer more flexibility on modeling data and relationships that we can leverage to solve such scenarios.

In this post I will cover a warehouse-scenario where articles are stored in warehouses for a given period of time. The model should allow analysis of the current stock on a daily base. All tables follow a SCD2 historization and shall always display the currently valid records for the selected time.

Our model consists of 4 tables:

Article:

Article_SID

Article_BK

ArticleGroup

VALID_FROM

VALID_TO

1

A1

AG1

2012-01-01

2012-01-03

2

A2

AG2

2012-01-01

3

A1

AG2

2012-01-03

Warehouse:

Warehouse_SID

Warehouse_BK

WarehouseOwner_BK

VALID_FROM

VALID_TO

1

WH1

John

2012-01-01

2

WH2

Dave

2012-01-01

2012-01-04

3

WH2

John

2012-01-04

Date:

Contains Date-Values from 2012-01-01 till 2012-01-15

Inventory (Facts):

Article_BK

Warehouse_BK

Amount_BASE

VALID_FROM

VALID_TO

A1

WH1

100

2012-01-01

2012-01-03

A1

WH2

50

2012-01-02

2012-01-04

A1

WH1

20

2012-01-01

A2

WH1

10

2012-01-01

2012-01-04

A2

WH2

70

2012-01-03

2012-01-05

A2

WH2

30

2012-01-10

As you can see our facts are related using the BusinessKey (BK) and are thereby independent of any historical changes within any dimension.

This means that SCD2-changes in dimension have no impact on our fact-table reducing most of the complex ETL that is usually required to propagate those changes to the fact-table to allow in-time analysis

Once you have loaded these tables into PowerPivot and try to create relationships you will get errors as none of the tables can be related as the BKs are not unique in any table. But that’s OK as all tables have to be filtered on the ReferenceDate first to show the structure at selected date.

So, how can we find out which rows are currently active?

First of all we have to create a measure that returns the LastDate in the current time-slice (assuming that we always want to get the most current structures, e.g. 31st of December when doing analysis over a whole year etc.)

Using LASTDATE()-function this can be accomplished very easily:

ReferenceDate:=LASTDATE('Date'[Date])

All other tables have to be filtered on this ReferenceDate to get only the valid rows at this point in time.

I used a combination of COUNTROWS() and FILTER() to create a measure that identifies a value as valid or not:

Using a pivot-table we can see the WarehouseOwner changing over time. The owner of WH2 changes to John with 2012-01-04, as Dave does not own any warehouse by that time anymore, his entry also gets invalidated:

To get the valid Amount at a given ReferenceDate we have to sum only the valid rows for that date: