What is Data Integration?

Building a single view of the truth

Data Integration Defined

Data integration is a common industry term referring to the requirement to combine data from multiple separate business systems into a single unified view, often called a single view of the truth. This unified view is typically stored in a central data repository known as a data warehouse.

For example, customer data integration involves the extraction of information about each individual customer from disparate business systems such as sales, accounts, and marketing, which is then combined into a single view of the customer to be used for customer service, reporting and analysis.

ETL and ELT

There are a number of different data integration approaches which can be used to populate a data warehouse, two of the most common are:

ETL – Extract, Transform and Load. Data is extracted from multiple source systems, transformed into a usable format, and then loaded into a data warehouse where users can query and report on the unified view of the data. Learn more about ETL ›

ELT – Extract, Load and Transform. Data is extracted from source systems, loaded into a database staging area, and then transformed to a usable format within the database. This approach may be necessary to meet check-pointing, auditing, and security requirements.

The trade-off with ELT is that storage space and CPU usage within the database can be relatively expensive. Using ETL to perform the transformation step outside of the database reduces the storage and CPU load on the data warehouse, making ETL a more cost-effective approach for many applications.

Data Integration Criteria

Three major data integration criteria to consider when building a data warehouse are:

Freshness – Since data integration processes are executed periodically, data freshness refers to the delay between when a change occurs on a source system to when the change appears in the data warehouse.

Granularity – Data from source systems is often summarised or aggregated during the data integration process. Data granularity refers to the level of detail that will be stored in the data warehouse.

Matching – The business rules that define how data from different systems should be matched can be quite complex. For a simple example – is “Bill Smith” the same individual as “William Smith”?

These criteria will affect the performance and storage requirements for the data warehouse, and hence the cost of the hardware and software required for the project.

Discover a Smarter Approach to Data Integration

Learn 5 Tips to Break Through ELT Roadblocks

In the beginning, we had Extract, Transform, and Load (ETL). Soon after, as data volumes grew, people were quick to point out the value that databases could bring to data integration, so we had Extract, Load, and Transform (ELT).

The problem is that databases were never designed for ELT and, in fact, can create roadblocks that prevent organisations from addressing their needs.

Learn how to end to the ETL vs. ELT debate once and for all and discover a smarter approach that overcomes the limitations of conventional data integration tools.