How to add a regular expression or an SQL pattern to a column analysis

You can add to any column analysis one or more regular expressions or SQL patterns
against which you can match the content of the column to be analyzed.

Warning

If the database you are using does not support regular
expressions or if the query template is not defined in the studio, you need
first to declare the user defined function and define the query template
before being able to add any of the specified patterns to the column
analysis. For more information, see Managing User-Defined Functions in databases.

Prerequisite(s): You have selected the Profiling perspective in the studio. A column analysis is
open in the analysis editor.

To add a regular expression or an SQL pattern to a column analysis, do the
following:

How to edit a pattern in the column analysis

Prerequisite(s): You have selected the Profiling perspective in the studio. A column analysis is
open in the analysis editor.

To edit a pattern added to an analyzed column:

In the Analyze Columns view in the
analysis editor, right-click the pattern you want to edit and select
Edit pattern from the contextual
menu.

The pattern editor opens showing the selected pattern metadata.

In the Pattern Definition view, edit the
pattern definition, or change the selected database, or add other patterns
specific to available databases using the [+] button.

If the regular pattern is simple enough to be used in all databases,
select Default in the list.

When you edit a pattern through the analysis editor, you modify the
pattern in the studio repository. Make sure that your modifications are
suitable for all other analyses that may be using the modified
pattern.

Save your changes.

How to view the data analyzed against patterns

When you add one or more patterns to an analyzed column, you check all existing
data in the column against the specified pattern(s). After the execution of the
column analysis, using the java or the SQL engine you can access a list of all the
valid/invalid data in the analyzed column.

When you use the Java engine to run the analysis, the view of the actual data will
open in the Profiling perspective of the studio. While if
you use the SQL engine to execute the analysis, the view of the actual data will
open in the Data Explorer perspective.

Prerequisite(s):

You have selected the Profiling
perspective in the studio.

You have installed in the studio the SQL explorer libraries that are
required for data quality.

If you do not install these libraries, the Data Explorer
perspective will be missing from the studio and many features will not be available.
For further information about identifying and installing external modules, see the
Talend Installation Guide.

To view the actual data in the column analyzed against a specific pattern, do the
following:

When using the SQL engine, the view opens in the Data Explorer
perspective listing valid/invalid rows or values of the analyzed data according to
the limits set in the data explorer.

This explorer view will also give some basic information about the analysis
itself. Such information is of great help when working with multiple analysis at the
same time.

The data explorer does not support connections which has empty user name, such as
Single sign-on of MS SQL Server. If you analyze data using such connection and you
try to view data rows and values in the Data Explorer
perspective, a warning message prompt you to set your connection credentials to the
SQL Server.

When using the Java engine, the view opens in the Profiling perspective of the studio listing
the number of valid/invalid data according to the row limit you set in the Analysis parameters view of the analysis editor. For more
information, see Using the Java or the SQL engine.

You can save the executed query and list it under the Libraries > Source Files folders in the DQ
Repository tree view if you click the save icon on the SQL editor
toolbar. For more information, see Saving the queries executed on indicators.

How to recuperate valid and /or invalid rows

When you add one or more patterns to an analyzed column, you check all existing
data in the column against the specified pattern(s). After the execution of the
column analysis, you can generate a ready-to-use Job that recuperates the valid,
invalid or both types of rows and write them in output files or databases.