I was hoping one of you could help me with an issue we are having regarding the PK of a Dimension changing as versioning takes place.

We have the dimension dim_interviewer that contains the Interviewer's org structure, Interviewer code, Region and versioning columns. We don't store any identifiable information about this person like their social, name or address.

The Interviewer works on cases that are stored in the fact_case table using the Interviewer PK. However, when the Interviewer's manager changes or when the Interviewer moves to a new region, we version their data and create a new dim record, new PK, with their current org structure. Now we have one Interviewer with fact_case records under 2 different PK's.

Besides the complexity of the query to see how an Interviewer performed over the past few years, is there a better way to handle this?

Reporting is not about keys, its about attributes. While you may not wish to keep personal identification information, you do need to keep something that identifies the position. Is 'Interviewer Code' such an identity? If it is, would it not be the same value on both versions of the dimension row?

LAndrews wrote:It sounds like the dimension is behaving exactly as a type-2 SCD should. The entire purpose of the SCD type-2 is to easily like the facts with the correct version of the dimension.

One question - why do you not have the remainder of the interviewer information in the interview dimension?

Perhaps that is what is causing the complexity?

Thanks LAndrews for your speedy reply! To answer your first question: Security requirements dictate that we not store any Personally Identifiable Information (PII) in our DW. My concern with having multiple PK's is providing the ability to look back over specific periods to find out what a single Interviewers performance was. I will have to get back to you on the specific reporting requirement. Anyway, does it ever make sense to create a surrogate key to identify the Interviewer and maintain that over the life of the Interviewer, much like an SSN?

ngalemmo wrote:Reporting is not about keys, its about attributes. While you may not wish to keep personal identification information, you do need to keep something that identifies the position. Is 'Interviewer Code' such an identity? If it is, would it not be the same value on both versions of the dimension row?

Unfortuantely, Interviewer Code is not unique. Interviewer Code, Region and program make the Interviewer unique. Let me get some more info on the difficulty of the query required to perform this lookup. No response needed unless you want to clarify. thanks

So, you do have a unique identifier, albeit 3 columns. Fine. Why not generate an arbitrary ID attribute that remains the same across versions of the dimension member? Use that attribute to group information about an interviewer. It would identify a unique person, but not the person themselves.

ngalemmo wrote:So, you do have a unique identifier, albeit 3 columns. Fine. Why not generate an arbitrary ID attribute that remains the same across versions of the dimension member? Use that attribute to group information about an interviewer. It would identify a unique person, but not the person themselves.

Thanks ngalemmo, yes, that will solve the problem. I did find out the reporting issue and this will take care of the issue. thanks