Data warehouses do not, generally speaking, create their own data. Instead they are always "fed" data from other sources—in many cases, from a myriad of sources. In a perfect world, data could be loaded directly from the sources into the DW, but that is seldom (if ever!) the case. Over 80% of a DW project is dedicated to the Extract, Transformation and Load (or more recently Extract, Load, then Transform), so it is imperative that data sources be carefully understood.

Given the three data loading scenarios (initial, incremental, real-time), data source characteristics can present numerous challenges to the ETL/ELT process.

Best Practices

Following are some best practices:

Manage the data source identification process:

Identify Subject Matter Experts (SMEs).

Identify dimension data sources.

Identify fact data sources.

When the major data sources have been identified it is time to quickly gain detailed understanding of each one: