SCD: Slowly Changing Dimensions or Schroedinger’s Cat Dilemma?

In the world of theoretical physics there is a theoretical cat owned by Erwin Schroedinger. This famously fickle feline posed a problem: he could not decide if he was alive or dead.

In the Schroedinger’s Cat thought experiment, a living cat is placed in a box with a nefarious device. This device consists of a Geiger counter, an extremely small radioactive substance with a half life of one hour, and some hydrocyanic acid. The radioactive substance has a 50/50 chance of decaying within an hour. If it does, the flask of acid will burst and the cat will die. If it does not, the flask will remain intact and the cat will live. But because the box is closed (and soundproof), we cannot conclude the final state of the cat, and the formulas of quantum mechanics would tell us that the cat is both living and dead.

One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer which shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The Ïˆ-function of the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.

There are three popular interpretations of this experiment (don’t worry; we’ll see what this has to do with Oracle soon):

The Copenhagen Interpretation

In this interpretation, the results of the test are not complete until an observation takes place. Measurement, or observation (whether by a conscious source such as a physicist, or a non-conscious source such as a mechanical device) will eliminate one of the two parts of the superposition (|decayed atom, cat dies> + |non-decayed atom, cat lives>), and the wave containing one possibility or the other will collapse, resulting in a single post-observation outcome. Whether the cat is alive or dead inside the box prior to observation is irrelevant; the fact remains that when you look into the box, the cat will either go on to chase theoretical mice or will be given a proper theoretical burial.

The Many Worlds Interpretation

The many worlds interpretation says that at the time an observation occurs, it does not collapse waveforms but decoheres the possible outcomes. Because the physicist (or device, no special relevance is given to conscious beings) has become entangled with the quantum states of the cat, observer-states are formed for both possibilities and multiple universes are spawned of the possible outcomes. That is to say that a universe will exist where the cat is alive and a universe will also exist where the cat is dead. These universes could theoretically spawn from every true-or-false outcome in existence, resulting in an unthinkable multiverse in which all binary possibilities have been played out in one form or another. A true-or-false decision (|alive> or |dead>) would result in two universes, each with the opportunity to spawn later true-or-false decisions and therefore more opportunities to create additional universes. I can’t help but think of Douglas Adams’ Infinite Improbability Drive from Hitchhiker’s Guide to the Galaxy when I think about this interpretation!

The Many Histories Interpretation

This interpretation is something of a mix of the previous two. It agrees with the many worlds interpretation that waveforms do not completely collapse, but disagrees that this will go on to form multiple futures. When an observation/measurement is made, the observer becomes entangled with the state of the cat and decoherence takes place, just as with the many worlds interpretation. However, the state that did not “really” happen is ignored. The waveforms associated with this possibility still exist, but are seen as inaccessible as only one true future can come of any true-or-false situation once an observation is made.

What the hell does this have to do with Oracle?

Oracle is only a computer program, and therefore does exactly what it is coded to do (most of the time). Quantum mechanics really do not seem to have any relevance in the Oracle world because it is coded with specific outcomes to every true-or-false question. Logic gates are applied to bits, and the programmed outcome takes place. Non-programmed outcomes result in errors (ORA-600, ORA-7445).

However, things get more complex when we are modeling new environments because we end up dealing with conceptual outcomes with multiple possibilities. This is especially true in dimensional modeling, because dimensions are supposed to hold factual lookup data. How can we record a fact when important data regarding that fact may change?

For example, imagine a tax company that keeps track of deductions found in their various offices and wishes to form a star schema that will allow past and present analysis. Its (incomplete) star schema may resemble the picture shown here:

The data will come together in our fact table to show that he had $25,000 in deductions for this combination of dimensional data. This “fact” is currently safe. During the next year, E. Schroedinger and his wife have a lovely baby boy (at least in this universe!).

The data will once again come together in our fact table, this time reporting $28,000 in deductions (I’m not a CPA, don’t get me on tax code) due to the extra dependent. This “fact” is currently safe, just as the last one was.

However, we now have a paradox in our data. In 2007, we report $28,000 in deductions, which was based on the fact that E. Schroedinger had four dependents. In 2006, we reported $25,000 in deductions due to E. Schroedinger’s 3 dependents. When we run our analytic reports for 2007, we will get great results; we will be able to break down the deductions and the number of dependents will play a proper role in these calculations. But when we run our reports against 2006, the deduction calculations will not compute properly. The CLIENT_DIM record will show 4 dependents, but the deduction amount for the 2006 year will have been based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over time.

This is known as a slowly changing dimension (or SCD). Though we may not realize it, almost every dimension is in fact slowly changing; stores may move, clients may die (especially if they hang out with sadistic quantum physicists with ready supplies of hydrocyanic acid), and even our human definition of time can change over time (consider the changes to daylight savings this year). However, we don’t have to worry about all of these possible changes; we only have to worry about the ones pertaining to the facts on which we are attempting to report. Our business needs, in the end, determine which dimensions must be slowly changing.

There are three types of slowly changing dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the designer of the star schema eliminate paradox from their dimensional model (just as the three interpretations of the Schroedinger’s Cat thought experiment tries to eliminate the paradox of the living dead).

Type 1: Overwrite the old value with the new value and call it a day. This is very useful when dealing with issues such as typos on the client’s name. We don’t care about the history in this case because it was incorrect anyways.

Type 2: Create a new record in the dimension with a new primary key. In the example we’ve given, there would be two records in CLIENT_DIM for E. Schroedinger, one in which he has 3 dependents and one in which he has 4. Though he is one person from the business point of view, he is two people from a dimensional point of view.

Type 3: Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change. This type of SCD resolution would be beneficial if there is a change that can happen once and only once (such as death).

These three types of SCD resolution usually help in resolving changes to “factual” lookup data. However, we can see clear correlations between these three types of resolutions and the three interpretations of the Schroedinger’s Cat thought experiment!

Bringing it all together

SCD Management Type 1 clearly matches up with the Copenhagen Interpretation of the Schroedinger’s Cat thought experiment. In the Copenhagen Interpretation, the state of the cat changes and all other states are discarded as the waveform collapses. Criticism to this interpretation applies as well to Type 1 SCD resolution. The Copenhagen Interpretation ignores the possibility of reconstruction; in quantum mechanics, it must be possible to return to any original state before measurement was taken place. In our star schema, it also ignores the possibility of reconstruction; we will not be able to return to the original state or even acknowledge that a previous state existed for the purpose of analytics.

SCD Management Type 2 matches with the Many Worlds Interpretation of the Schroedinger’s Cat thought experiment. Instead of completely destroying the other possible waveform, we simply maintain that the two possibilities decohere and form their own universes that will no longer share any correlation. This interpretation shares a similar problem with Type 2 SCD resolution. By spawning new records (universes) as the outcome of a changing event, we create multiple possibilities that no longer share any correlation. For instance, if E. Schroedinger has a new dependent, there will be two resulting rows: one in which has 3 dependents and one in which he has 4. If he then legally changes his last name to Schrodinger, we will have to record that change since it is important for tax records. Because of this, we will have three total records for this one client (E. Schroedinger with 3 dependents, E. Schroedinger with 4 dependents, and E. Schrodinger with 4 dependents). These three records will not have any correlation unless we create some sort of superkey that properly identifies a single person and their many instances. This will be important if we will be doing mining that incorporates multiple times, clients, and locations in our analysis.

SCD Management Type 3 matches with the Many Histories Interpretation of the Schroedinger’s Cat thought experiment. When an observed outcome occurs (like the birth of a new child), the old record is changed to reflect the new “real” outcome; in this case, the addition of a single dependent. However, the change is noted by the addition of a column such as an effective date, to show that this is not the only outcome that has ever existed, but it is THE outcome that does exist at this time. In effect, the old outcome (3 dependents) still exists, but is discarded, ignored, and irretrievable now that the outcomes are decoherent. In our star schema, this type of resolution will only provide us with a confusing result of “This is the case now, but it was not always so. It changed on ….” For some situations, as with Type 1 resolution, this will suffice, such as the death of a client. We only need to record that death once. However, if any new data enters our model after the death of the client (post-mortem taxes?) or if the status changes again (miraculous recovery!), we may have unreliable report output.

Conclusion

Just as a note, I understand that the Schroedinger’s Cat thought experiment and Slowly Changing Dimensions are not a true match. In quantum mechanics, we are calculating multiple events that happen at the same exact point in time (or over a period of unknown time); whereas in a data warehouse we are dealing with history and fully acknowledge the changes time may have on our data. However, one cannot help but notice the correlation between the two forms of paradox and their resolutions. In fact, in data warehousing the so-called Schroedinger’s Cat paradox becomes even more problematic because we are forced to not only predict future outcomes based on dimensional data, but to also report on past/present information based on the same data. Physicists attempting to provide interpretations of this thought experiment only have to worry about future conditions; the past and present are unmeasured, and therefore have no relevance on the problem except that they are in a quantum state. They seek to explain the future of the cat once observation/decoherence have taken place. If Schroedinger’s Cat were in a data warehouse, we would have to analyze the entire life of the cat, the cat inside that horrible box of doom, and try to figure out whether the cat will want dry food, wet food, or a good burial place after the experiment is finished.

Post navigation

8 comments

You’ve simply modeled this wrong. Number of dependents is not properly part of the Client dimension, it is a fact. As a fact, it has relationships with both the client dimension and the time dimension. Thus no paradox – client has 3 dependents associated with the 2006 tax year and 4 associated with the 2007 tax year.

Okay, granted. Perhaps I shouldn’t have used a numerical amount pertaining to the results as an example; however, the fact (no pun intended) still remains that slowly changing dimensions do exist, and require some resolution. Instead, I could have used the client’s last name or the location’s address as a changeable dimension value, as those could also cause problems with report output.

Had I filled in the model more (hey, I said it was incomplete) I might have had other examples that I could have used. 😉 I understand your point on the number of deductions, but SCD resolution is still a necessity for the reasons I mention in my article.

I actually had a nagging thought about the same thing you mention after looking at the post a couple times, but I wasn’t sure if I should change it once I had posted it. I appreciate you reading and commenting!

Hi Oracle Alchemist, I like your comparision of quantum physics with slowly changing dimensions. Currently working on a project ” Data Warehouse Design and implementation to address the problem of slowly changing dimensions”. I am using SCD type 6 (1+2+3). Just to add that apart from external triggering events (e.g. change of address), data model changes can also trigger SCD.
Following a purist approach, one can (given storage space) capture all change states of an entity but to what end?
Thanks for pointing out that in the end, the most important change to capture is data that is of relevance from the business perspective.

First, this is a beautifully written article to illustrate the need for the different approaches of SCD.
Second, allow me to use your example to rebut Mr. Flack.
Number of Dependents is indeed a “Non-additive” fact. However there is a stronger argument in favor of presenting it as a characteristic Attribute of the “Person” entity, much as one would with a Person’s “Salary”.
If one were to institute SCD Type 2 in the Client Dimension, one would arrive at 2 distinct surrogate keys representing two different characteristics of the Person in the two years in question; the Business or Natural Keys (e.g. SSN) would remain undisturbed.