How to analyze repository items - 6.2

Talend Studio provides you with advanced capabilities for analyzing any
given item, or even a Job, in the Repository tree view.
This implies two forms of navigation: moving forward to discover descendant items up to
the target component (Impact Analysis) and moving backward to discover the ancestor
items starting with the source component (Data Lineage). The results of the analysis
will determine where data comes from, how it is transformed, and where it is going or
vice versa.

Warning

All items on which you want to execute impact analysis or
data lineage must be centralized in the Repository
tree view under any of the following nodes: Joblet
Designs, Contexts, SQL Templates, Referenced
project or Metadata.

Impact analysis

Impact analysis helps to identify all the Jobs that use any of the items
centralized in the Repository tree view and that
will be impacted by a change in the parameters of a repository item.

Impact analysis also analyzes the data flow in each of the listed Jobs to show all
the components and stages the data flow passes through and the transformation done
on data from the source component up to the target component.

Warning

All items on which you want to execute impact analysis
or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblet Designs, Contexts, SQL
Templates, Referenced project or Metadata.

The example below shows an impact analysis done on a database connection item
stored under the Metadata node in the Repository tree view.

To analyze data flow in each of the listed Jobs from the source component up to
the target component, complete the following:

In the Repository tree view, expand
Metadata and browse to the metadata entry
you want to analyze, employees under the DB connection
mysql in this example.

Right-click the entry you want to analyze and select Impact Analysis.

A progress bar indicates the process of checking for all Jobs that use the
modified metadata parameter. The [Impact
Analysis] view appears in the Studio to list all Jobs that use
the selected metadata entry. The names of the selected database connection
and table schema are displayed in the corresponding fields.

Note

You can also open this view if you select Window
> Show View > Talend > Impact Analysis.

Right-click any of the listed Jobs and select:

Select...

To...

Open Job

open the corresponding Job in the Studio
workspace.

Expand/Collapse

expand/collapse all the items included in the selected
Job.

Thus, you have an outline of the Jobs that use the selected metadata
entry.

From the Column list, select the column
name for which you want to analyze the data flow from the data source (input
component), through various components and stages, to the data destination
(output component), Name in this example.

Note

The Last version check box is
selected by default. This option allows you to select the last version
of your Job instead of displaying all versions of your Job in the
analysis results.

Click Analysis....

A bar displays to indicate the progress of the analysis operation and the
analysis results display in the view.

Note

Alternatively, you can directly right-click a particular column in the
Repository tree view and select Impact Analysis from the contextual menu to display the
analysis results regarding that column in the [Impact
Analysis] view.

The impact analysis results trace the components and transformations the data in
the source column Name passes through before being written in
the output column Name.

Data lineage

Data lineage shows the data flow from the data destination (output component),
through various components and stages, to the data source (input component). The
data lineage results trace the life cycle of the data flow between different
components, including the operations that are performed upon the data.

Warning

All items on which you want to execute impact analysis
or data lineage must be centralized in the Repository tree view under any of the following nodes: Joblets Designs, Contexts, SQL
Templates, Reference project or Metadata.

The example below shows the data lineage made on a database connection item stored
under the Metadata node in the Repository tree view.

To launch a data lineage on a metadata item, complete the following:

In the Repository tree view, expand
Metadata > Db
Connection and then expand the database connection you want to
analyze, mysql in this example.

Right-click the centralized table schema of which you want to analyze the
life cycle of the data flow, employees in this
example.

The Impact Analysis view displays the
Jobs that use the selected table schema. The names of the selected database
connection and table schema are displayed in the corresponding
fields.

From the Column list, select the column
name for which you want to analyze the data flow from the data destination
(output component), through various components and stages, to the data
source (input component). The column to be analyzed in this example is
called Name.

You can skip this step by right-clicking the column
Name in the Repository
tree view and selecting Impact Analysis
from the contextual menu.

Click Data Lineage.

A bar appears to indicate the progress of the analysis operation and the
analysis results are displayed in the view.

Right-click a listed Job and select Open
Job from the contextual menu.

The Job opens in the design workspace.

The data lineage results trace backward the components and transformations
the data in the output column Name passes through
before being written in this column.

How to export the results of impact analysis/data lineage to HTML

Talend Studio allows you to produce detailed documentation in
HTML of the results of the impact analysis or data lineage done on the selected
repository element. This documentation offers information related to the Jobs that
use this repository element including: project and author detail, project
description and a preview of the graphical results of the analysis done on the
impacted Jobs.

To generate an HTML document of an impact analysis or data lineage with
customization, complete the following:

After you analyze a given repository item as outlined in Impact analysis or Data lineage and in the Impact Analysis view, click the Export to HTML button.

The [Generate Documentation] dialog box
opens.

Enter the path to where you want to store the generated documentation
archive or browse to the desired location and then give a name for this HTML
archive.

Select the Custom CSS template to export
check box to activate the CSS File field if
you need use your own CSS file to customize the exported HTML files. The
destination folder for HTML will contain the html file, a css file, an xml
file and a pictures folder.

Click Finish to validate the operation
and close the dialog box.

An archive file that contains all required files along with the HTML
output file is created in the specified path.

Double-click the HTML file in the generated archive to open it in your
favorite browser.

Note

The archive file gathers all generated documents including the HTML that gives a
description of the project that holds the analyzed Jobs in addition to a preview of
the analysis graphical results.

How to export the results of impact analysis/data lineage to XML

Talend Studio also allows you to export the results of the
impact analysis or data lineage done on the selected repository element to an XML
document. This tree-structured documentation can be processed by automated
analytical applications for Job analysis and reporting purposes.

To generate an XML document of the results of impact analysis or data lineage on
the selected a repository item, complete the following:

After you analyze a given repository item as outlined in Impact analysis or Data lineage and in the Impact Analysis view, click the Export to XML button.

The [Generate XML] dialog box
appears.

Enter the path to where you want to store the generated XML document or
browse to the desired location and then give a name for this XML
file.

Select the Overwrite existing files without
warning check box to suppress the warning message if the
specified filename already exists.

Click Finish to validate the operation
and close the dialog box.

An XML file that contains the impact analysis or data lineage information
is created in the specified path.

The figure below illustrates an example of a generated XML file, opened in
a text editor.