Profiling in data warehousing project

Profiling into a data warehousing and business project can help success and more…

Primary expectation

A good profiler analyzes data, structure and all elements with a basic attitude:

EVERYTHING IS POSSIBLE!!

All Data must be analyze and never thing that a source is accurate. Human is not perfect and can make some mistake.

Overview

Definition

Data dictionary:

It’s a collection of basic metadata about data attributes. It includes basic attribute listings, detailed descriptions and usage patterns, as well as reference information, including valid values and their meanings, default values, etc.

Data models:

Subject area models define main data subjects – categories of high level business objects whose data is stored in the database. Relational data models depict logical relationships between various entities and attributes.

Data profiling:

Data models and dictionary are the source of initial knowledge about data. Data profiling is a group of experimental techniques aimed at examining the data and understanding its actual structure and dependencies.

Importance

The reason it is so important is that actual data is often very different from what is theoretically expected. Over time data models and dictionaries become inaccurate. Data profiling is like an X-Ray showing the hidden truth. It is key to building correct data mappings and quality rules. As a rule of thumb, the more in-depth analysis and profiling we conduct the easier it is to design a comprehensive set of data mappings and quality rules and achieve greater success in data conversion and consolidations.

All techniques

Data profiling is often mistakenly equated to attribute profiling. The cause of that mistake is the proliferation of efficient attribute profiling tools. However, comprehensive data profiling is a far broader exercise.

Techniques are:

Subject profiling

examines subjects in different tables or on different systems and helps to find where the information about each subject is stored;

Relationship profiling

is an exercise in identifying entity keys and relationships as well as counting occurrences for each relationship in the data model. It is necessary to validate existing relational data models or build them when none are available;

Attribute profiling

examines values of individual data attributes and provides information about frequencies and distributions of their values. It helps to identify meaning and allowed values for an attribute;

Timeline profiling

looks for patterns in historical data, such as temporal distribution of the data, patterns of values for different time periods, etc…;

State-transition model profiling

examines lifecycle of state-dependent objects and provides actual information about the order and characteristics of states and actions. It helps build or validate state-transition models;