Data Warehouse

What it is and why it matters

A data warehouse (or enterprise data warehouse) stores large amounts of data that has been collected and integrated from multiple sources. Because organizations depend on this data for analytics or reporting purposes, the data needs to be consistently formatted and easily accessible – two qualities that define data warehousing and makes it essential to today’s businesses.

History of the Data Warehouse

In the 1970s and '80s, data began to proliferate and organizations needed an easy way store and access their information. Computer scientist Bill Inmon, who’s considered the father of data warehousing, began to define the concept in the 1970s and is credited as coining the term “data warehouse.” He published Building the Data Warehouse, a book lauded as a fundamental source on data warehousing technology, in 1992. Inmon’s definition of the data warehouse takes a “top-down” approach, where a centralized repository is established first, and then data marts – which contain specific subsets of data – are created within that repository.

Ralph Kimball, another technology expert who published The Data Warehouse Toolkit in the mid '90s, took a slightly different tactic to the data warehousing concept with his “bottom up” approach, where individual data marts are developed first and later integrated together to create a data warehouse.

Data warehousing remains relevant today, yet it’s evolving as the industry changes to accommodate cloud storage and real-time analytics. One emerging data storage tool that's similar to a data warehouse is a data lake, which was brought about by disruptive low-cost technologies such as Apache Hadoop. Data lakes are often used in conjunction with unfettered data streaming in and storing without processing or building schemas.

Manage Your Data Beyond Boundaries

How can you gain insight from the huge amount of information in your database? Data management domain expert Matthew Magne describes a scenario where you can stream, cleanse and profile data into your data lake – and then extract knowledge in real time.

Why are data warehouses important?

Data is essential to organizations making informed decisions, so it stands to reason that data warehouses are just as important because they store all that data. Data warehouses can:

Store large amounts of data in a central database – and in a standard format.

Integrate data from many different sources and standardize it, so it’s ready for analytics or reporting.

Maintain historical records, since it can store months or even years of data.

Keep data secure by storing it in a single location. Access can be granted only to those who need specific data.

Data Warehouses vs. Other Storage Systems

While data warehousing is a common storage solution for data, it's not the only solution. Here’s how data warehouses compare to similar types of technology.

Data Warehouse

Stores a large amount of enterprise data encompassing several subject areas.

Can be difficult to build.

Large size.

Data is structured and ready to use for analytics or reporting.

Data Mart

Stores a smaller amount of data; data typically covers a single subject area and is used by one department, such as marketing or sales.

Faster and easier to build than a data warehouse.

Limited memory.

Data is structured and ready to use for analytics or reporting.

Data Lake

Stores a large amount of raw data.

Data remains in its unaltered state until it’s needed.

Enables users to query smaller, more relevant and more flexible data sets.

How It Works

A data warehouse begins with the data itself, which is collected from both internal and external sources. Data is typically stored in a data warehouse through an extract, transform and load (ETL) process, where information is extracted from the source, transformed into high-quality data and then loaded into a warehouse. Businesses perform this process on a regular basis to keep data updated and prepared for the next step.

When an organization is ready to use its data for analytics or reporting, the focus shifts from data warehousing to business intelligence tools. Technologies including visual analytics and data exploration are used to help businesses gain important insights from data.