In this article, Baya Pavliashvili provides a tutorial for populating data warehouses. The article includes a number of tips for choosing the storage model, indexing fact and dimension tables, and much more.

Like this article? We recommend

Like this article? We recommend

Building and maintaining data warehouses (DW) is one of the hottest topics in
today's IT industry. Companies that have been collecting transactional data
for decades have also been thinking about using such data effectively. Many
businesses have been utilizing standard reports that have a
"set-in-stone" structure. Others have their programmers run ad-hoc
reports when needed. However, such reports no longer can provide strategic
advantage to companies. What they need instead is a cost-effective set of tools
to let them slice and dice their data along appropriate dimensions to get them
the particular information that they need each time. Data warehouses happen to
be such tools.

Even though data warehousing has been around for more than 20 years, the
total cost of owning a DW has been unaffordable for small to midsize companies.
The introduction of data warehousing support with SQL Server versions 7.0 and
2000 has made data warehouses much more affordable.

Many complex steps are involved in building and maintaining a successful data
warehouse. First, you have to identify all the reporting needs of the
organization. Next you need to figure where the data will come from and how to
integrate data originating from various sources. Then you need to build a
dimensional data model that will support the reporting needs and populate this
data model with data.

This article talks about populating the data warehouses. Although most
principles covered in this article apply to any database management system
utilized for building a data warehouse, I will cover specific implementation
details only for Microsoft SQL Server 2000.

50,000-Foot View

All data warehouses consist of measures and dimensions. The measures are
individual facts to be represented on reports, whereas the dimensions are how
the facts need to be broken down. For instance, a data warehouse for a grocery
chain might have dimensions for customers, suppliers, stores, managers, and
measures of costs and revenues.

When the dimensions and measures have been identified and the dimensional
model has been built, it's time to move on to the physical implementation.
This phase consists of several steps:

I've put step 5 at the end for a reason. Populating the tables and
building cubes manually the first time gives you an idea of what it takes to
refresh data in the fact and dimension tables, how long it takes to build
aggregations, and which dimensions need to be continuously updated. Therefore,
physical implementation is more likely to be successful with a bottom-up
approach: First perform your steps manually, and then automate as many of the
tasks as possible through your code.