I am developing a star schema for a bank, the fact table maintains metrics about loans, and some of the dimensions - warning codes and statuses - regarding the loan may change frequently. In fact, some warning codes may change 4 - 6 times a day for a loan. Our current strategy is to take a snapshot of the data from the operational system at the end of each business day and populate the DM with this data. However, I am concerned that some dimensions would change daily, creating many rows in the dimension table with no updates to the fact table. How often can a Type II SCD change before a new modeling strategy is required? What is the best way to model such dimensions? Thanks in advance for your help!

If you are taking the snapshot at the end of the day, then you would have had only the latest status/code for the loan. Not sure how the different code changes with in the day are captured here? Does the source system captures every status as a separate transaction/record in the database?

The operation system does report all of the transactional changes; these are recorded in an audit table. The data warehouse will capture only the last change recorded at the end of the day. However, we could have many hundreds of loans with such changes in a day; potentially it would record a new row in the loan dimension each day for a particular loan, over a period of several weeks or months as the loan progresses throught delinquency, loan modification, foreclosure processing etc. Other loans would record few if any changes to the dimension. In my experience dimensions do not change quite so often, so I would like some feedback and thoughts on this situation from the community.

The loan dimension is one of the more problematic dimensions for loan metrics. The first and foremost problem is the one to one relationship with the fact table. This is a red flag in dimensional modeling. Over time, the growth of the fact and loan dimension will degrade performance. Another issue is the high volume of changes and typically large number of attributes that end up in the loan dimension. The solution is to dump the loan dimension althogether. Every attribute needs to be redesigned into a more meaningful dimension. You will end up with several junk dimensions due to the sheer number of columns due to the "I don't really need that but store it just in case" requirements but that is a much better alternative to the massive loan dimension.

Thank you for your reply; it is an interesting perspective that I had not considered before. In reviewing the current model (and more requirments may be forthcoming from the business), if I were to remove the Loan Dimension and extract its attributes into more discreet dimensions, I count over 20 dimensions, plus another 13 constraints from the Date Dimension required for the loan fact. I suspect this number may grow as the data warehouse is expanded to include other lines of business. Wouldn't this also cause a performance issue? The platofrm is SQL Server, which technically supports up to 253 foreign keys per table, but I fear that query performance may suffer with so many constrainsts.

I think the cocept of dimension is misused here. The loan life cycle is large, there will be lot of changes in the loan during the life cycle of 3-25 years.It is better to capture them as the snapshot type fact rather than as a dimension. You will have many flags and degenerate dimensions, but at the upside, your ETL and reporting performance is better and the data storage is also less. We have built the daily, weeky, monthly snapshots of loan facts along with the customer/account as the major dimensions.

rschulz01 wrote:...The platofrm is SQL Server, which technically supports up to 253 foreign keys per table, but I fear that query performance may suffer with so many constrainsts.

Thanks again!

Rick

Actually, query performance will be very good. Only load times are impacted due to the potential amount of indexes. You don't have to create an index for every foreign key. You should also partition your indexes in conjunction with your fact table. The index partitioning alone can save hours on load time.