When we look together at their examples, though, I've always found a dimensional solution.

Here's a list of things people sometimes do not realize can be handled by standard dimensional techniques.

At the end of the post is a large collection of links to previous posts treat these topcis in more detail.

Note: This article should not be interpreted as a criticism of entity-relationship modeling. Rather, the point is to cover commonly overlooked capabilities of dimensional modeling.

Predictive Analytics: Storage of Granular Detail

I often hear people say: if you want to do predictive analytics, its important to capture all the relationships among data attributes. This is incorrect.

The job of predictive analytics is to tell us what is important--not the reverse! Neither data mining nor predictive analytics requires information to be stored in an entity-relationship format.

As people who do this work will tell you, they don't care how the data is supplied. Nor do they care what modelers believe are the important relationships. Many tools will completely de-normalize the data before doing anything else.

In order to support predictive analytics, standard best practices of dimensional modeling apply:

Fact tables record business activities and conditions at the lowest level of detail possible in granular, first line stars.

Transaction identifiers are carried in the model if available.

Surrounding dimensions are enriched with detail to the fullest extent possible

In the dimension tables, changes are tracked and time-stamped (“type 2”).

If we do these things with our dimensional data, we have not lost anything that is that is required to do predictive analytics.

Many-to-many Relationships: Multi-Valued Dimension Bridge Usually, each fact in a star schema can be linked to a single member of a dimension. This is often mistaken for a rule, which leads to the incorrect conclusion that dimensional models cannot accommodate a fact that may link to multiple members of a dimension.

(This is often generalized as “dimensional cannot handle many-to-many relationships.”)

In a dimensional model, this situation is referred to as a multi-valued dimension. We use a bridge table to link the facts to the dimension in question. This special table sets up a many-to-many relationship between the facts and dimensions, allowing any number of dimension members to be associated with any number of facts.

Here are some examples:

In sales, a bridge table may be used to associate a sale, as recorded in a fact table, with multiple sales people.

In insurance, a single claim (again, tracked in a fact table) can be associated with multiple parties.

In healthcare, a single encounter may be associated with multiple providers or tests.

In government, a single audit or inspection may be associated with multiple findings.

In law enforcement, a single arrest may be associated with multiple charges.

Repeating Attributes: Multi-Values Attribute Bridge

It is sometimes suggested that dimensional models cannot gracefully accommodate repeating attributes. The dimensional solution is again a bridge table. This time, it is placed between the dimension table and an outrigger that contains the attribute in question.

Examples include:

Companies that have multiple Standard Industry Classification codes

People with multiple phone numbers or addresses

Accounts with multiple account holders

Patients with multiple diagnoses

Documents with multiple keywords.

Recursive Hierarchies: Hierarchy Bridge Ragged hierarchies, unbalanced hierarchies, or recursive hierarchies are often cited as the downfall of the dimensional model. In fact, a solution exists, and it is extraordinarily powerful. The hierarchy bridge table allows facts to be aggregated either by rolling up or rolling down through the hierarchy, regardless of number of levels.

Examples include:

Parts that are made up of other parts

Departments that fall within other departments

Geographies that fall within other geographies

Companies that own other companies

Relationships Between Dimensions: Factless Fact Tables

A star schema does not include relationships between dimension tables. This has led to the misconception that you can't track these relationships.

In fact, any important relationship between dimension tables can and should be captured. It is done using factless fact tables. (Dimensions are never directly linked because of the implications this would have on slow change processing.)

Examples include:

Employees filling a job in a department

Marketing promotions in effect in a geographical market

Students registered for courses

The primary care physician assigned to an insured party

Brokers assigned to clients

Subtyping: Core and Custom

Another situation where modelers often believe they must “fall back” on ER techniques is when the attributes of a dimension table vary by type. This variation is often misconstrued as calling for the ER construct known as subtyping. Similar variation might also be found with associated facts

In the dimensional model, heterogeneous attributes are handled via the core and custom technique.

A core dimension captures common attributes, and type specific replicas capture the core attributes along with those specific to the subtype. Dimensions can then be joined to the fact table according to the analytic requirement. If there is variation in the facts, the same can be done with fact tables.

Examples include:

Products with attributes that vary by type

Customers that have different characteristics depending on whether they are businesses or individuals

In insurance, policies that have different characteristics depending on whether they are group or individual polices

In healthcare, medical procedures or tests that have different characteristics and result metrics depending on the test

In retail, stores that have varying characteristics depending on the type (e.g. owned, franchise, pocket)

Non Additive Metrics: Snapshots or Summary Tables

In a classic star schema design, facts are recorded at a granular level and “rolled up” across various dimensions at query time. Detractors often assume this means that non-additive facts have no place in a dimensional model.

In fact, dimensional modelers have several tools for handling non-additive facts.

Those that can be broken down into additive components are captured at the component level. This is common for many key business metrics such as:

Margin rate: stored as margin amount and cost amount in a single fact table. The ratio is computed after queries aggregate the detail.

Yield or conversion percentage: stored as quote count and order count in two separate fact tables, with ratio converted after aggregation at query time.

Other non-additive facts cannot be broken down into fully additive components. These are usually captured at the appropriate level of detail, and stored in snapshot tables or summary tables. Common examples include:

Period-to-date amounts, stored in a snapshot table or summary table

Distinct counts, stored in a summary table

Non-numeric grades, stored in a transaction-grained fact table

While these non-additive facts are flexible than additive facts in terms of how they can be used, this is not a result of dimensional representation.

Conclusion

Every technique mentioned here is part of dimensional modeling cannon. None are stopgaps or workarounds. While some may prove problematic for some of our BI software, these problems are not unique to the dimensional world.

In the end, the dimensional model can represent the same real-world complexities that entity-relationship models can. No ER fallback required.

- Chris

Learn More
All of these topics have been covered previously on this blog. Here are some links to get you started.