Learn how to capture and retrieve metadata information such as extract, transform, load (ETL) rules by using CA Erwin Data Modeler. This data modeling excerpt also covers step by step of how to capture the ETL information using and the generating of reports with the captured information to communicate effectively to other teams.

The following is an excerpt from Documenting ETL Rules using CA Erwin Data Modeler, written by Sampath Kumar, Technology Architect, Information as Service (InAs) at Infosys Technologies Ltd. It is reprinted here with permission. Read the excerpt below to learn about features and functionality in CA Erwin Data Modeler that can help users overcome common challenges with data warehouse development projects. Learn how to use CA Erwin Data...

By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

Modeler to capture metadata information such as extract, transform, load (ETL) rules and learn how to generate reports and share this information effectively across your organization.

Documenting ETL Rules using CA Erwin Data Modeler

Abstract In any data warehouse development project some of the major challenges include

Effective capture and maintenance of metadata information in data model such as data source, transformation rules and data synchronization, etc.

Effective communication of captured metadata information by data modeler to other teams such as ETL

This document covers features in CA Erwin Data Modeler which can be leveraged for capturing the metadata information such as Extract Transform Load (ETL) rules. This document explains step by step of how to capture the ETL information using Erwin and also covers the generation of reports with the captured information to communicate effectively to other teams.

Introduction The data warehouse combines information from several Online Transactional Processing (OLTP) systems and archive data into a single decision support system. It can be either relational or non relational data source (both structured and unstructured data). In order to keep the data in synch with the operational system it's very essential to capture the data source for each column in the data warehouse and information of when and how the data is updated.

So in a nutshell the following information needs to be captured in any data warehouse environment.

Source of data

Transformation rules-The method in which the data is getting extracted, transformed and loaded

Frequency: The frequency and timing of data warehouse updates.

In many organizations it used to be a separate document apart from data model but it becomes very hard to maintain document and data model in synch.

More on ETL and data integration

Why it's important Data modeling is the first step which converts the business rules into a data model and the data modeler is the one who understand the rules from business counterparts (both in a structured and unstructured way). As a result of this, the data modeler captures most of the business rules directly in the data model and some of them (such as data source, transformation rules and frequency above) needs to be passed on to other teams such as DBA and ETL .It's very essential to capture all data related business rules as a part of data modeling effort to avoid getting lost. CA Erwin Data Modeler provides effective way to capture this information and as a data modeler it should be captured as part of modeling efforts.

Approach This document covers how the above challenges can be addressed using Data Transformation and Data Movement features available in CA Erwin Data Modeler. To explain better there will be a simple running example throughout this document which will navigate step by step.

Overview The CA Erwin Data Modeler has come up with the following salient features to capture the metadata information effectively.

Let's explore these features in detail in the rest of the document using simple example of Customer_Dim.

Customer_Dim Let's take a fictitious example of an entity Customer Dimension to explain the above features. Let's assume that it's sourced from multiple operational systems (relational DB), attributes having different transformation comments and the frequency of customer information getting updated is daily.

Capturing Data Source In order to proceed further lets create the empty data model using "Create Model" (File –>New) of the model type Logical/Physical and target database as Teradata

Let's create the Customer_Dim table and add the attributes

In order to make this model as dimensional model and to capture the data movement rules goto Model->Model Properties and select the check box for Dimensional and Data Movement

About the author

Sampath Kumar is a technology architect, Information as Service (InAs) at Infosys Technologies Ltd. Banking and Capital Market Business unit. He functions as lead information architect for complex, high volume, full cycle implementations worldwide -- utilizing the disciplines of data warehouse (DW), business intelligence (BI) and master data management (MDM). He has 11 years of rich and varied experience in IT industry and project management. Prior to joining Infosys, he was a senior data analyst at American Express Credit Cards for their Worldwide Risk Information Management group. He holds masters in software engineering and can be reached at sampath_kumar01@infosys.com.

E-Zine

0 comments

E-Mail

Username / Password

Password

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy