Building the Data Warehouse

The appeal of the multidimensional approach to database design for data marts begins with the data model. For all of the practical use of a data model as one of the foundations of design, there are some shortcomings. Consider the simple data model in Figure 3.51.

The data model in the figure shows four simple entities with relationships. If all that is considered is the data model for database design, the inference can be drawn that all entities are equal. In other words, from a design standpoint the data model appears to make all entities peers with each other. Approaching database design for the data warehouse solely from the perspective of the data model produces a “flat” effect. In actuality, for a variety of reasons, entities in the world of data marts are anything but peers. Some entities demand their own special treatment.

To see why the data model perspective of the data and the relationships in an organization are distorted, consider the three-dimensional perspective shown in Figure 3.52. Here entities representing vendor, customer, product, and shipment will be sparsely populated, while entities for orders will be heavily populated. There will be many more occurrences of data residing in the table or tables representing the order entity than there will be for any other entity.

Figure 3.51 A simple two-dimensional data model gives the impression that all entities are equal.

Because of the massive volume of data populating entity order, a different design treatment is required.

The design structure that is required to manage large amounts of data residing in an entity in a data mart is called a “star join.” As a simple example of a star join, consider the data structure shown in Figure 3.53. ORDER is at the center of the star join and is the entity that will be heavily populated. Surrounding ORDER are the entities PART, DATE, SUPPLIER, and SHIPMENT. Each of the surrounding entities will have only a modest number of occurrences of data. The center of the star join-ORDER-is called the “fact table.” The surrounding entities—PART, DATE, SUPPLIER, and SHIPMENT—are called “dimension tables.” The fact table contains unique identifying data for ORDER, as well as data unique to the order itself. The fact table also contains prejoined foreign key references to tables outlying itself-the dimension tables. The foreign key relationships may be accompanied by nonforeign key information inside the star join if, in fact, the nonforeign key information is used frequently with the fact table. As an example, the description of a PART may be stored inside the fact table along with the PART number if, in fact, the description is used frequently as part of ORDER processing.