Denormalization?

Previously, I normalized my tables to Sixth Normal Form (6NF). Now, I want to consider denormalizing, if and when it's appropriate.

Denormalizing is different than an un-normalized schema, which is never, ever, recommended.

Strategies for denormalization appear in data-warehousing designs, specifically in OLAP star schemas. However, a snowflake schema is often a viable (and normalized) alternative in data-warehouse environments.

For all the complexity in normal form definitions [I neglected to mention Domain-Key Normal Form (DKNF) which would bring us to eight separate definitions], in practice normalizing a schema is relatively simple. Splitting a table that can be split (without data loss) is normalizing, preserving or merging is denormalizing.

Storage

The nature of relational database engines rarely justifies denormalizing, and often the consequences of denormalizing can incur explosive storage growth.

That's 200-trillion and 4-quadrillion in the 2NF, and 40-quintillion in the 1NF upper bound. The reason is that these tables implicitly contain many-to-many mappings which have combinatorial growth. With terabyte storage you'd still need a few million servers in the 1NF case. The culprits are combinations of users, domains, roles, and partners -- normalizing all of these didn't happen till 4NF and 5NF.

When to Denormalize

I denormalize when there's a justifiable reason not to split and I'm absolutely certain it won't impact storage growth.

Third-party software (e.g, for OLAP cube analysis) may require a denormalized star scheme as an integration point. It is recommended to provide a denormalized view that abstracts the normalized schema, but not to denormalize the schema itself (e.g., a star-schema view of a snowflake schema).

For a practical example of denormalizing, let's consider the previous schema. I purposely left-out any auto-increment id's. If we were to add these in (and maintain 6NF) there would be a new table for each id -- which would nearly double the number of tables in the schema. Let's add only user_id, domain_id, and a login_id.

I want to keep this manageable, so I will collapse the simple tables, and preserve the new auto-increment id's. Done correctly, I can denormalize without impacting any of the functional requirements that were solved by normalizing:

This is about as denormalized as I would go in production code. In fact, each table appears to be in DKNF, and most are in 6NF. Importantly, I've normalized all mappings to avoid explosive storage requirements and denormalized the non-mapping tables into appropriate domains.