Building a successfull Data Hub

This article is the first of a serie which aims to demonstrate that with the right approach and the right modelisation method, your Data Warehouse initiative, can be successful at a very reasonable price.

2 visions, 1 way !

For many years, two classical visions clash in what concerns the modeling of Data Warehouses. The Inmon’s approach by subject and normalized and the Kimball’s approach using star schemas and where the integration into an Enterprise Data Warehouse is ensured by Conformed Dimensions and the use of a Matrix Bus.

Although less present than the two conventional approaches, there is a third way: the Data Vault modeling approach advocated by its inventor Dan Linstedt since the beginning of the 2000s.

Data Vault modeling is a kind of terraced approach between Inmon and Kimball.

Data Vault Modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that, apart from the modeling aspect, deals with issues such as auditing, tracing of data, loading speed and resilience to change.

The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. (Sources : from Wikipedia, the free encyclopedia)

A Data Vault model is composed of three types of entities :

The Hubs, Links and Satellites

The Hubs are business concepts. These entities contain natural keys (business keys) that identify the concept and which are inherently very stable. They contain no data that describes the entity (these are kept in the Satellite entities described below). They are often the point of connection (hence the term ‘hub’) among several sectors of an organization.

The Links are associative entities. They bind together at least two Hubs; in other words, they are in relation to business concepts.

Satellites contain data describing the hubs and links at any given time and through time. These entities contain the context (from business process) of a hub or a link. As descriptive data changes often, the idea of satellites is to preserve changes when they occur. As its name suggests, a satellite is a dependent entity in relation to a hub or a link.

Conversely, a hub must always contain at least one satellite to describe it.

The separation

One of the central ideas of a Data Vault model is to separate the structural data (Hubs and Links between the Hubs) of the descriptive data that define the context of these data (Satellites). Structural concepts of an organization are thus separated from the contexts of use of these concepts.

The context of source systems

Another central idea of a Data Vault model is that it keeps intact the context of source systems. The data coming from the sources are integrated into a raw Data Vault type warehouse without transformations (beyond integration and historization). The data are therefore loaded quickly in their raw format including the date and the source of loading. It is therefore possible to reconstruct the image of a source at any moment in time. Not transforming the data is one of the fundamental differences with two conventional approaches.

It is said that a Data Vault Warehouse is a warehouse that can represent raw data (Raw Data Warehouse) and also can represent a business integrated view of data (Business DataWarehouse).

The Data Vault approach offers several advantages:

– ? It is flexible and resistant to change

– ? It is extensible

– ? Changes in the sources are quickly reflected in the warehouse

– ? It makes it easy to restore an image of source data to any moment in time

To be continued… (next article: Deep diving in to a successful modelisation method)