Why denormalized data is there in Data Warehosue and normalized in OLTP?

It depends on the approach that has been chosen for the business requirements. There are practical implementations which are completely normalized data warehouses, they have a dimensional (star schema) architecture with normalized fact tables and denormalized dimensions, and sometimes it’s both as a combination. The main objective is to look at the data from the business needs perspective to slice, dice, drill down and drill across the entities and dimensions.

Normalization is the process of efficiently organizing data in a database and the main purpose of the process it to

eliminating redundant data (for example, storing the same data in more than one table)

These reduce the amount of space a database consumes and ensure that data is logically stored.

As we know there are different forms of normal forms. 1NF thru 5NF. Out of these 1NF,2NF,3NF are usually used in the Data Models 4NF and 5NF are rarely used.

1NF sets the very basic rules for an organized database like

Eliminate duplicative columns from the same table.

Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

2NF further addresses the concept of removing duplicate data:

Meet all the requirements of the first normal form.

Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

Create relationships between these new tables and their predecessors using foreign keys.

3NF goes one large step further:

Meet all the requirements of the second normal form.

Remove columns that are not dependent upon the primary key.

Coming to the actual differences of why Data Warehouse should be normalized:

3NF (Third normal form) data models work well when the system you are designing is highly transactional. That means there are a lot of INSERT, DELETE and UPDATE database operations. But because Data Warehouses are mostly used for querying of the historic data, we would try to avoid JOINS at all cost. Having a De-normalizing data model is one of the attempts to solve this.

And Data Warehouse’s main purpose is to support

business-level decisions

data aggregated across various departments of an enterprise to make effective centralized decisions

historical data which you would most often scan to compute statistics like counts, sums, averages etc

data warehouses store huge amounts of data

normalized models wouldn’t fit quite well among business people. These models are sometimes difficult for them to fully understand and read, thus turning decision-making into a rather complex process, instead of what these guys really want: a simple, yet understandable one.

So, as mentioned earlier which approach to go with depends on what we need out of the warehouse and how frequently and how the data is being gathered and loaded in to the warehouse. Although, denormalized warehouse can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.

Share this:

Related

Business Intelligence Consultant and Trainer with 13+ years of extensive work experience on various client engagements. Delivered many large data warehousing projects and trained numerous professionals on business intelligence technologies. Extensively worked on all facets of data warehousing including requirement gathering, gap analysis, database design, data integration, data modeling, enterprise reporting, data analytics, data quality, data visualization, OLAP.
Has worked on broad range of business verticals and hold exceptional expertise on various ETL tools like Informatica Powercenter, SSIS, ODI and IDQ, Data Virtualization, DVO, MDM.