Eine weitere WordPress-Website

A script to create a sample data warehouse – part 1 the dimension

In the last week, I have written a sample script to create a skeleton data warehouse. I will use this as a head start for future work where the skeleton will be used to implement an actual data warehouse.

The skeleton is written for a Teradata DBMS. This is the platform that I currently use.

The example data warehouse has two tables: a dimension table and a fact table. Hence we have a small example that can be expanded in the future.

Let us start with the dimension table.
See also the script with ExampleDW
The dimension table is created with:

This dimension has a surrogate key (SK) that is loaded with the DBMS facility. This leads to the situation that every record is given an attribute that can be used as a primary key. This SK is the primary key. It uniquely identifies a record within a table.
The table also has a business key (BK). This is something by which the business identifies an object (say a product, a contract or a customer). If we have a business key, we are able to know whether we have an earlier version of the object. If we get a new record on an object that we already have in the table, we may act accordingly.
In our case, we assume that we want to act in the following way whenever a new record is found on an object that we already have in the data warehouse.

The record is added to the table.

The record that describes the old state of the object is labelled as ‚old version‘ .

In the table we have an attribute „extra“, that acts as an example of a descriptive attribute. It can be a colour of the object, the size, the owner etc.

We also encounter an attribute „version_sdate“ which indicates the starting point of validity of the record. We assume that an object generates over time a series of records. Once the record is sent as the newest version of the object, it is stored in the data warehouse. We use the attribute version_sdate as the starting point in time after which the record is seen as the valid version of the object.

Likewise, we have a „version_edate“ that indicates the finishing point in time. If a record is stored in the data warehouse and if a new version is included in the data warehouse, the old record is updated with version_edate equals finishing point in time.

We first deleted all records in the source. This is followed by loading three records.

In an incremental load step, we may have three situations:

the record represents an object that is not yet included in the dimensional table. Whether or not this is the case can be assessed with the business key. If a new business key is encountered, we have this situation. The algorithm is quite simple: do we have a record is the source for which the business key is not yet included in the target dimension table. The sql reads as:

the record in the source is included exactly in the target table. In our case, we have a combination (BK, extra) that may already be present in the target table. In that case we have no reason to apply any change in the target table: the data are already present in the target. We may go on looking for records in the source that may need action.

the record in the source table represents an update on an object that is already present in the data warehouse. In that case we have a business key (BK) that is already present in the dimensional table. We also have a situation whereby the content in additional attributes are different from what we have in the dimensional table. In this case „extra“ in the source is not equal to „extra“ in the target table. We identify these records as: