Transcription

1 Talend Open Studio for MDM Getting Started Guide 6.0.0

2 Talend Open Studio for MDM Adapted for v Supersedes previous releases. Publication date: July 2, 2015 Copyleft This documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please read: Notices Talend is a trademark of Talend, Inc. All brands, product names, company names, trademarks and service marks are the properties of their respective owners.

3 Table of Contents Preface... v General information Purpose Audience Typographical conventions Feedback and Support v v v v v Chapter Getting Started with Talend Studio... 1 Launching Talend Studio How to launch the Studio for the first time How to connect to TalendForge How to access a Repository How to set up a project Working with different workspace directories How to create a new workspace directory How to connect to a different workspace directory Working with projects How to create a project How to import the demo project How to import projects How to open a project How to delete a project How to export a project Multi-perspective approach Switching between different perspectives Saving the configuration of a perspective Chapter Working in Talend Studio - basic Job examples Getting started with a basic Job Creating a Job Adding components to the Job Connecting the components together Configuring the components Executing the Job Chapter 3. Profiling data Profiling customer data Identifying data anomalies Chapter 4. Building a simple MDM project Preparing your project in the Studio Setting up a data model and creating some business entities Defining a data container Creating Views Working with the data in the Web User Interface Opening MDM Web User Interface Creating a new data record

4

5 Preface General information Purpose This guide aims at helping users get started with the Talend Open Studio for MDM quickly. For detailed explanations on feaures and functions of the Talend Open Studio for MDM, see the other documentation delivered with the Talend Open Studio for MDM. Information presented in this document applies to Talend Open Studio for MDM Audience This guide is for users and administrators of Talend Open Studio for MDM. The layout of GUI screens provided in this document may vary slightly from your actual GUI. 3. Typographical conventions This guide uses the following typographical conventions: text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu and options, text in [bold]: window, wizard, and dialog box titles, text in courier: system parameters typed in by the user, text in italics: file, schema, column, row, and variable names, text in italics: file, schema, column, row, and variable names, The icon indicates an item that provides additional information about an important point. It is also used to add comments related to a table or a figure, The icon indicates a message that gives information about the execution requirements or recommendation type. It is also used to refer to situations or information the end-user needs to be aware of or pay special attention to. Any command is highlighted with a grey background or code typeface. Feedback and Support Your feedback is valuable. Do not hesitate to give your input, make suggestions or requests regarding this documentation or product and find support from the Talend team, on Talend's Forum website at:

6 Feedback and Support vi

7 Chapter Getting Started with Talend Studio This chapter provides basic information required to get started with Talend Studio, including launching Talend Studio and creating projects.

8 Launching Talend Studio Launching Talend Studio This section guides you through the basics for launching Talend Studio for the first time and opening your first project in the Studio, and provides information on setting up a project. How to launch the Studio for the first time To open Talend Studio for the first time, complete the following: Uncompress the Talend Studio zip file and, in the folder, double-click the executable file corresponding to your operating system. The Studio zip archive contains binaries for several platforms including Mac OS X and Linux/Unix. In the [User License Agreement] dialog box that opens, read and accept the terms of the end user license agreement to proceed. 3. In the Talend Studio login window, select an option to define your project that will hold all Jobs and Business models designed in the Studio. This login window appears only when the Studio is started for the first time. When you launch the Studio again, the normal login window opens, which provides one more option, a connection list box, for subscription-based users to select a repository connection when launching the Studio. If you plan to use the same repository connection and / or project at your next Studio launch, you can skip the login window to speed up Studio launch by clearing the Always ask me at startup check box. Then, if you want to see the login window again, go to the menu Window > Preferences to open the [Preferences] window, select Talend, and select the Always show project dialog at startup check box. Select Create a new project, specify a project name and click Finish to create a new project. For more information, see How to create a project. Select Import a demo project and click Finish to import a demo project that includes numerous samples of ready-to-use Jobs. This Demo project can help you understand the functionalities of different Talend components. For more information, see How to import the demo project. Select Import an existing project and click Finish to import an existing projects. For more information, see How to import projects. 2

9 How to connect to TalendForge If you want to modify the default repository connection, click Manage Connections to set up your connection before setting up a project. For further information about connecting to a repository, see How to access a Repository. As the purpose of this procedure is to create a new project, select Create a new project, fill in a project name in the text field, and click Finish. The [Welcome] window opens. From this window you have direct links to Demo projects, user documentation, tutorials, Talend forum, Talend on-demand training and Talend latest news. 4. Click Start now! to open Talend Studio main window, which displays a welcome page that provides useful tips for beginners on how to get started with the Studio. Clicking an underlined link brings you to the corresponding tab view or opens the corresponding dialog box. For more information on how to open a project, see How to open a project. 5. When the [Additional Talend Packages] wizard opens, install additional packages such as language packs if needed. For more information, see the section about installing additional packages in the Talend Installation and Upgrade Guide. You can skip this installation step and close the wizard by clicking Cancel. This wizard appears each time you launch the studio if any additional package is available for installation unless you select the Do not show this again check box. You can also display this wizard by selecting Help > Install Additional Packages from the menu bar. How to connect to TalendForge Every fourth time you launch Talend Studio, until you are connected to the Talend Community, the [Connect to TalendForge[ dialog box opens, inviting you to connect to the Talend Community so that you can check, download, install external components and upload your own components to the Talend Community to share with other Talend users directly in the Exchange view of your Job designer in the Studio. To learn more about the Talend Community, click the TalendForge Terms of Use link. For more information on using and sharing community components, see the section on how to download/upload Talend community components of your Studio User Guide. 3

10 How to connect to TalendForge If you want to connect to the Talend Community later, click Skip this Step to continue launching the Studio without setting up a connection to the Talend Community. By default, the Studio will automatically collect product usage data and send the data periodically to servers hosted by Talend for product usage analysis and sharing purposes only. If you do not want the Studio to do so, clear the I want to help to improve Talend by sharing anonymous usage statistics check box. You can also turn on or off usage data collection from the [Preferences] dialog box (Talend > Usage Data Collector). For more information, see the section on setting Talend Studio preferences of your Studio User Guide. Fill in the required information, select the I Agree to the TalendForge Terms of Use check box, and click CREATE ACCOUNT to create your account and connect to the Talend Community automatically and continue launching the Studio. Be assured that any personal information you may provide to Talend will never be transmitted to third parties nor used for any purpose other than joining and logging in to the Talend Community and being informed of Talend latest updates. 4

11 How to access a Repository If you already have created an account at click Connect to Existing Account, fill in your user name and password, and click CONNECT TO MY ACCOUNT to sign in the Talend Community and continue launching the Studio. This page will not appear again when the Studio starts up once you successfully connect to the Talend Community. To show this page again, select Talend > Exchange from the [Preferences] dialog box, and click Sign In. For more information, see the section on setting Talend Studio preferences of your Studio User Guide. 3. How to access a Repository When launching Talend Studio, you can connect to a local repository where you store the data for your projects, including Jobs and business models, metadata, routines, etc. You can also connect to a remote repository where you store the same type of data to work collaboratively on projects. 3. How to connect to a local repository To set a connection to a local repository, do the following: On the login window of Talend Studio, click the Manage Connections button to open the repository connection setup dialog box. 5

12 How to set up a project Depending on the Studio product you are using, the product information displayed in your Studio may differ slightly from what is shown above. If needed, type in a name and a description for your connection in the relevant fields. 3. In the User field, type in the address that will be used as your user login. This field is compulsory to be able to use Talend Studio. Be aware that the entered is never used for purposes other than logging in. 4. By default, the Workspace field shows the path to the current workspace directory which contains all of the folders belonging to the project created. To change the workspace directory, type in the name of an existing directory or click the [...] button next to the Workspace field and browse to your preferred workspace directory. Upon changing your workspace directory, unless it is the first startup, you need to restart your Talend Studio by clicking the Restart button back on the login window for your change to take effect. For more information about workspace directories, see Working with different workspace directories. 5. Click OK to validate your changes and return to the login window. 4. How to set up a project To open Talend Studio, you must first set up a project. You can set up a project by: creating a new project. For more information, see How to create a project. importing one or more projects you already created in other sessions of Talend Studio. For more information, see How to import projects. 6

13 Working with different workspace directories importing the Demo project. For more information, see How to import the demo project. Working with different workspace directories Talend Studio makes it possible to create many workspace directories and connect to a workspace different from the one you are currently working on, if necessary. This flexibility enables you to store these directories wherever you want and give the same project name to two or more different projects as long as you store the projects in different directories. How to create a new workspace directory Talend Studio is delivered with a default workspace directory. However, you can create as many new directories as you want and store your project folders in them according to your preferences. If you have already started the Studio, select File > Switch Project or Workspace from the menu bar to restart the Studio. On the login window, click Manage Connections to open the connection setup dialog box. 3. On the connection setup dialog box, click the [...] button next to the Workspace field. 4. In the [Browse For Folder] dialog box, browse to the parent directory under which you want to create a new workspace directory, click Make New Folder, and enter the name of your new workspace directory. Then click OK to validate directory creation and close the dialob box. 7

14 How to connect to a different workspace directory 5. Click OK to validate your connection setup and go back to the login window. 6. Back on the login window, click the Restart button to restart your Talend Studio for the change to take effect. How to connect to a different workspace directory In Talend Studio, you can select the workspace directory you want to store your project folders in according to your preferences. If you have already started the Studio, select File > Switch Project or Workspace from the menu bar to restart the Studio. On the login window, click the Manage Connections button to open the connection setup dialog box. 3. On the connection setup dialog box, click the [...] button next to the Workspace field. 8

15 How to connect to a different workspace directory 4. In the [Browse For Folder] dialog box, browse to your preferred folder to use as the new workspace directory, and click OK to validate your directory selection and close the dialog box. 5. Click OK to validate your connection setup and go back to the login window. 9

16 Working with projects 6. Back on the login window, click the Restart button to restart your Talend Studio for the change to take effect. 3. Working with projects In Talend Studio, the highest physical structure for storing all different types of data integration Jobs, metadata, routines, etc. is the "project". From the login window of the Studio, you can: create a local project. When you launch the Studio for the first time, there are no default projects listed. You need to create a project that will hold all data integration Jobs and business models you design in the current instance of the Studio. You can create as many projects as you need to store your data of different instances of your Studio. When creating a new project, a tree folder is automatically created in the workspace directory on your repository server. This will correspond to the Repository tree view displayed on the main window of the Studio. For more information, see How to create a project. import the Demo project to discover the features of Talend Studio based on samples of different ready-to-use Jobs. When you import the Demo project, it is automatically installed in the workspace directory of the current session of the Studio. For more information, see How to import the demo project. import projects you have already created with previous releases of Talend Studio into your current Talend Studio workspace directory. For more information, see How to import projects. open a project you created or imported in the Studio. For more information, see How to open a project. delete local projects that you already created or imported and that you do not need any longer. For more information, see How to delete a project. Once you launch Talend Studio, you can export the resources of one or more of the created projects in the current instance of the Studio. For more information, see How to export a project. 3. How to create a project To create a project at the initial startup of the Studio, do the following: Launch Talend Studio. On the login window, select the Create a new project option and enter a project name in the field. 10

17 How to create a project 3. Click Finish to create the project and open it in the Studio. To create a new project after the initial startup of the Studio, do the following: On the login window, select the Create a new project option and enter a project name in the field. Click Create to create the project. The newly created project is displayed on the list of existing projects. 11

18 How to import the demo project 3. Select the project on the list and click Finish to open the project in the Studio. Later, if you want to switch between projects, on the Studio menu bar, use the combination File > Switch Project or Workspace. 3. How to import the demo project You can import one or more demo projects that include numerous samples of ready to use Jobs into your Talend Studio to help you understand the functionalities of different Talend components. To import a demo project, proceed as follows: When launching your Talend Studio, select the Import a demo project option on the Studio login window and click Select, or click the Demos link on the welcome window, to open the [Import demo project] dialog box. After launching the Studio, click button on the toolbar, or select Help > Welcome from the Studio menu bar to open the welcome window and then click the Demos link, to open the [Import demo project] dialog box. In the [Import Demo Project] dialog box, select the demo project you want to import and view the description on the right panel. The demo projects available in the dialog box may vary depending on the product you are using. 12

19 How to import the demo project 3. Click Finish to close the dialog box. 4. In the new dialog box that opens, type in a new project name and description information if needed. 5. Click Finish to create the project. All the samples of the demo project are imported into the newly created project, and the name of the new project is displayed in the Project list on the login screen. 13

20 How to import projects 6. To open the imported demo project in Talend Studio, back on the login window, select it from the Project list and then click Finish. The Job samples in the open demo project are automatically imported into your workspace directory and made available in the Repository tree view under the Job Designs folder How to import projects In Talend Studio, you can import one or more projects you already created with previous releases of the Studio. To import a single project, do the following: From the Studio login window, select Import an existing project then click Select to open the [Import] wizard. Click the Import project as button and enter a name for your new project in the Project Name field. 3. Click Select root directory or Select archive file depending on the source you want to import from. 14

21 How to import projects 4. Click Browse... to select the workspace directory/archive file of the specific project folder. By default, the workspace in selection is the current release's one. Browse up to reach the previous release workspace directory or the archive file containing the projects to import. 5. Click Finish to validate the operation and return to the login window. To import several projects simultaneously, do the following: From the Studio login window, select Import an existing project then click Select to open the [Import] wizard. Click Import several projects. 3. Click Select root directory or Select archive file depending on the source you want to import from. 4. Click Browse... to select the workspace directory/archive file of the specific project folder. By default, the workspace in selection is the current release's one. Browse up to reach the previous release workspace directory or the archive file containing the projects to import. 5. Select the Copy projects into workspace check box to make a copy of the imported project instead of moving it. This option is available only when you import several projects from a root directory. If you want to remove the original project folders from the Talend Studio workspace directory you import from, clear this check box. But we strongly recommend you to keep it selected for backup purposes. 15

22 How to open a project 6. Select the Hide projects that already exist in the workspace check box to hide existing projects from the Projects list. This option is available only when you import several projects. 7. From the Projects list, select the projects to import and click Finish to validate the operation. Upon successful project import, the names of the imported projects are displayed on the Project list of the login window. You can now select the imported project you want to open in Talend Studio and click Finish to launch the Studio. A generation initialization window might come up when launching the application. Wait until the initialization is complete How to open a project When you launch Talend Studio for the first time, no project names are displayed on the Project list. First you need to create a project or import a Demo project in order to populate the Project list with the corresponding project names that you can then open in the Studio. To open a project in Talend Studio: On the Studio login screen, select the project of interest from the project list and click Finish. 16

23 How to delete a project A progress bar appears. Wait until the task is complete and the Talend Studio main window opens. When you open a project imported from a previous version of the Studio, an information window pops up to list a short description of the successful migration tasks How to delete a project On the login screen, click Manage Connections, then on the dialog box that opens click Delete Existing Project(s) to open the [Select Project] dialog box. Select the check box(es) of the project(s) you want to delete. 17

24 How to export a project 3. Click OK to validate the deletion. The project list on the login window is refreshed accordingly. Be careful, this action is irreversible. When you click OK, there is no way to recuperate the deleted project(s). If you select the Do not delete projects physically check box, you can delete the selected project(s) only from the project list and still have it/them in the workspace directory of Talend Studio. Thus, you can recuperate the deleted project(s) any time using the Import existing project(s) as local option on the Project list from the login window How to export a project Talend Studio allows you to export projects created or imported in the current instance of Talend Studio. On the toolbar of the Studio main window, click dialog box. to open the [Export Talend projects in archive file] Select the check boxes of the projects you want to export. You can select only parts of the project through the Filter Types... link, if need be (for advanced users). 3. In the To archive file field, type in the name of or browse to the archive file where you want to export the selected projects. 4. In the Option area, select the compression format and the structure type you prefer. 18

25 Multi-perspective approach 5. Click Finish to validate the changes. The archived file that holds the exported projects is created in the defined place. 4. Multi-perspective approach Talend Studio offers a comprehensive set of tools and functions for all its key capabilities including data and application integration, data profiling and master data management. These tools are all accessible from different perspectives within the studio. 4. Switching between different perspectives There are different ways to switch between different perspectives in the studio. They are as follows: To switch between perspectives using quick access icons, do the following: In the top right corner of the studio, select: Icon to... open the Integration perspective where you have access to a set of components and routines dedicated to data integration. open the Profiling perspective where you can examine data in different data sources and design data cleansing analyses. open the MDM perspective where you can build data models and define the rules master data has to follow. open the Mediation perspective where you can carry out application integration processes. open the BPM perspective where you can design business workflows using graphical tools. Click the quick access icon in the top left corner of the studio to switch between the perspectives. Alternatively, you may switch between perspectives using the menu bar: On the menu bar, click Window > Perspective. Select from the list: Item to... Profiling open the data profiler perspective where you can examine data available in different data sources. Data Explorer open the data explorer perspective where you can browse and query analyzed data. 19

26 Saving the configuration of a perspective Item to... Other... open a dialog box from which you can select to open different perspectives that extend the studio functionalities. It is also possible, using the Window - Show view... combination, to show views from other perspectives in the open perspective. 4. Saving the configuration of a perspective You can save the configuration of your current perspective in order to list it as a new perspective in the perspective dialog box. To save the configuration of the current perspective, do the following: On the menu bar, click Window > Save Perspective As... In the Name field, enter a name. 3. Click OK. 20

27 Saving the configuration of a perspective The current perspective is saved as a new perspective under the new name. You can open this perspective any time by selecting it from the [Open Perspective] dialog box. For further information, see Switching between different perspectives. 21

28

29 Chapter Working in Talend Studio - basic Job examples This chapter provides basic Job examples to help users get started with Talend Studio.

30 Getting started with a basic Job Getting started with a basic Job This section provides a continuous example that will help you create, add components to, configure, and execute a simple Job. This Job will be named A_Basic_Job and will read a text file, display its content on the Run console, and then write the data into another text file. Creating a Job Talend Studio enables you to create a Job by dropping different technical components from the Palette onto the design workspace and then connecting these components together. To create the example Job described in this section, proceed as follows: In the Repository tree view of the Integration perspective, right-click the Job Designs node and select Create job from the contextual menu. The [New Job] wizard opens to help you define the main properties of the new Job. 24

31 Creating a Job Fill the Job properties as shown in the previous screenshot. The fields correspond to the following properties: Field Description Name the name of the new Job. Note that a message comes up if you enter prohibited characters. 3. Purpose Job purpose or any useful information regarding the Job use. Description Job description containing any information that helps you describe what the Job does and how it does it. Author a read-only field that shows by default the current user login. Locker a read-only field that shows by default the login of the user who owns the lock on the current Job. This field is empty when you are creating a Job and has data only when you are editing the properties of an existing Job. Version a read-only field. You can manually increment the version using the M and m buttons. Status a list to select from the status of the Job you are creating. Path a list to select from the folder in which the Job will be created. An empty design workspace opens up showing the name of the Job as a tab label. 25

32 Adding components to the Job The Job you created is now listed under the Job Designs node in the Repository tree view. You can open one or more of the created Jobs by simply double-clicking the Job label in the Repository tree view. Related topics: Classify the Jobs you created by creating folders. For more information, see your Talend Studio User Guide. Create a data integration Job. For more information, see your Talend Studio User Guide. Customize the workspace. For more information, see your Talend Studio User Guide. Adding components to the Job Now that the Job is created, components have to be added to the design workspace, a tfileinputdelimited, a tlogrow, and a tfileoutputdelimited in this example. There are several ways to add a component onto the design workspace. You can: find your component on the Palette by typing the search keyword(s) in the search field of the Palette and drop it onto the design workspace. add a component by directly typing your search keyword(s) on the design workspace. add an output component by dragging from an input component already existing on the design workspace. drag and drop a centralized metadata item from the Metadata node onto the design workspace, and then select the component of interest from the Components dialog box. This section describes the first three methods. For details about how to drop a component from the Metadata node, see your Talend Studio User Guide. Dropping the first component from the Palette The first component of this example will be added from the Palette. This component defines the first task executed by the Job. In this example, as you first want to read a text file, you will use the tfileinputdelimited component. 26

33 Adding components to the Job For more information regarding components and their functions, see Talend Open Studio Components Reference Guide. To drop a component from the Palette, proceed as follows: Enter the search keyword(s) in the search field of the Palette and press Enter to validate your search. The keyword(s) can be the partial or full name of the component, or a phrase describing its functionality if you don't know its name, for example, tfileinputde, fileinput, or read file row by row.. To use a descriptive phrase as keywords for a fuzzy search, make sure the Also search from Help when performing a component searching check box is selected on the Preferences > Palette Settings view. For more information, see your Talend Studio User Guide. Select the component you want to use and click on the design workspace where you want to drop the component. Each newly-added component is shown in a blue box to show that it as an individual Subjob. Adding the second component by typing on the design workspace The second component of our Job will be added by typing its name directly on the workspace, instead of dropping it from the Palette or from the Metadata node. 27

34 Adding components to the Job Prerequisite: Make sure you have selected the Enable Component Creation Assistant check box in the Studio preferences. For more information, see your Talend Studio User Guide. To add a component directly on the workspace, proceed as follows: Click where you want to add the component on the design workspace, and type your keywords, which can be the full or partial name of the component, or a phrase describing its functionality if you don't know its name. In our example, start typing tlog. To use a descriptive phrase as keywords for a fuzzy search, make sure the Also search from Help when performing a component searching check box is selected on the Preferences > Palette Settings view. For more information, see your Talend Studio User Guide. A list box appears below the text field displaying all the matching components in alphabetical order. Double-click the desired component to add it on the workspace, tlogrow in our example. 3. Adding an output component by dragging from an input one Now you will add the third component, a tfileoutputdelimited, to write the data read from the source file into another text file. We will add the component by dragging from the tlogrow component, which serves as an input component to the new one to be added. Click the tlogrow component to show the o icon docked to it. Drag and drop the o icon where you want to add a new component. A text field and a component list appear. The component list shows all the components that can be connected with the input component. 28

35 Connecting the components together 3. To narrow the search, type in the text field the name of the component you want to add or part of it, or a phrase describing the component's functionality if you don't know its name, and then double-click the component of interest, tfileoutputdelimited in this example, on the component list to add it onto the design workspace. The new component is automatically connected with the input component tlogrow, using a Row > Main connection. To use a descriptive phrase as keywords for a fuzzy search, make sure the Also search from Help when performing a component searching check box is selected on the Preferences > Palette Settings view. For more information, see your Talend Studio User Guide. 3. Connecting the components together Now that the components have been added on the workspace, they have to be connected together. Components connected together form a subjob. Jobs are composed of one or several subjobs carrying out various processes. In this example, as the tlogrow and tfileoutputdelimited components are already connected, you only need to connect the tfileinputdelimited to the tlogrow component. 29

36 Configuring the components To connect the components together, proceed as follows: Right-click the source component, tfileinputdelimited in this example. In the contextual menu that opens, select the type of connection you want to use to link the components, Row > Main in this example. 3. Click the target component to create the link, tlogrow in this example. Note that a black crossed circle is displayed if the target component is not compatible with the link. According to the nature and the role of the components you want to link together, several types of link are available. Only the authorized connections are listed in the contextual menu. 4. Configuring the components Now that the components are linked, their properties should be defined. Configuring the tfileinputdelimited component Double-click the tfileinputdelimited component to open its Basic settings view. Click the [...] button next to the File Name/Stream field. 30

37 Configuring the components 3. Browse your system or enter the path to the input file, customers.txt in this example. 4. In the Header field, enter 5. Click the [...] button next to Edit schema. 6. In the Schema Editor that opens, click three times the [+] button to add three columns. 7. Name the three columns id, CustomerName and CustomerAddress respectively and click OK to close the editor. 8. In the pop-up that opens, click OK accept the propagation of the changes. This allows you to copy the schema you created to the next component, tlogrow in this example. Configuring the tlogrow component Double-click the tlogrow component to open its Basic settings view. In the Mode area, select Table (print values in cells of a table). By doing so, the contents of the customers.txt file will be printed in a table and therefore more readable. 31

38 Executing the Job Configuring the tfileoutputdelimited component Double-click the tfileoutputdelimited component to open its Basic settings view. Click the [...] button next to the File Name field. 3. Browse your system or enter the path to the output file, customers.csv in this example. 4. Select the Include Header check box. 5. If needed, click the Sync columns button to retrieve the schema from the input component. 5. Executing the Job Now that components are configured, the Job can be executed. To do so, proceed as follows: Press Ctrl+S to save the Job. Go to Run tab, and click on Run to execute the Job. The file is read row by row and the extracted fields are displayed on the Run console and written to the specified output file. 32

39 Executing the Job 33

40

41 Chapter 3. Profiling data This chapter aims at users of Talend Data Quality who seek a real-life use case to help them take full control over data quality products. It describes how to use the Profiling perspective in Talend Studio to profile data.

42 Profiling customer data 3. Profiling customer data Incorporating appropriate data quality tools in your business processes is vital at the beginning of any project and through the project plan in order to see what type of data quality you have and decide how and what data to resolve. Suppose, for example, that you want to start a campaign for your sails and marketing groups, or you need to contact customers for billing and payment and your main source to contact appropriate people is and postal addresses. Having consistent and correct address data is vital in such campaign to be able to reach all people. This section provides an example of profiling US customer and postal addresses. 3. Identifying data anomalies The first step in this example is to profile the customer contact information in a MySQL database. The profiling results provides you with statistics about the values within each column. 3. How to profile address columns You will use the studio to analyze few customer columns including and postal. Using out-of-box indicators and patterns on these columns, you can show in the analysis results the matching and non-matching address data, the number of most frequent records for each distinct pattern and the row, duplicate and blank counts in each column. Defining the column analysis In the DQ Repository tree view, right-click the Analysis folder and select New Analysis. The [Create New Analysis] wizard opens. 36

43 Identifying data anomalies Start typing column in the search field, select Column Analysis from the list and click Next. 3. In the Name field, enter a name for the current column analysis. Avoid using special characters in the item names including: "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", " ", " ", """, "«", "»", "<", ">". These characters are all replaced with "_" in the file system and you may end up creating duplicate items. 4. Set column analysis metadata (purpose, description and author name) in the corresponding fields and click Next. 37

45 Identifying data anomalies Select the columns and click Finish to close the wizard. A file for the newly created column analysis is listed under the Analysis node in the DQ Repository tree view, and the analysis editor opens with the analysis metadata. 39

46 Identifying data anomalies 3. In the Data preview view, click Refresh Data. The data in the selected columns is displayed in the table. You can change your data source and your selected columns by using the New Connection and Select Data buttons respectively. 4. In the Limit field, set to 50 the number for the data records you want to display in the table and use as sample data. 5. Select n random rows to list 50 random records from the selected columns. Setting system indicators 40 From the Data preview view in the analysis editor, click Select indicators to open the [Indicator Selection] dialog box.

47 Identifying data anomalies Click in the cells next to indicators names to set indicator parameters for the analyzed columns and click OK. You want to see the row, blank and duplicate counts in all columns to see how consistent the data is. Also you want to use the Pattern Frequency Table indicator on the and postal columns in order to compute the number of most frequent records for each distinct pattern or value. Indicators are added accordingly to the columns in the Analyzed Columns view. 41

48 Identifying data anomalies 3. Click the option icon next to the Blank Count indicator and set 0 in the Upper threshold field. Defining thresholds on indicators is very helpful as it will write in red the count of the null values in the analysis results. Setting patterns You would want now to match the content of the column against a standard format and the postal column against a standard US zip code format. This will define the content, structure and quality of s and zip codes and give a percentage of the data that match the standard formats and the data that does not match. 42 In the Analyzed Columns view, click the icon next to .

49 Identifying data anomalies 3. In the [Pattern Selector] dialog box, expand Regex and browse to Address in the internet folder, and then click OK. Click the option icon next to the Address indicator and set 98.0 in the Lower threshold (%) field. If the number of the records that match the pattern is fewer than 98%, it will be written in red in the analysis results. 4. Do the same to add to the postal column the US Zipcode Validation pattern from the address folder. Executing the analysis and displaying the profiling results Save the column analysis in the analysis editor and then press F6 to execute it. A group of graphics is displayed in the Graphics panel to the right of the analysis editor showing the results of the column analysis including those for pattern matching. Click the Analysis Results tab at the bottom of the analysis editor to access a more detail result view. These results show the generated graphics for the analyzed columns accompanied with tables that detail the statistic and pattern matching results. The results for the column look as the following: 43

50 Identifying data anomalies The pattern matching results show that about 10% of the records do not match the standard pattern. The simple statistic results show that about 8% of the records are blank and that about 5% are duplicates. And the pattern frequency results give the number of most frequent records for each distinct pattern. This shows that the data is not consistent and you need to correct and cleans the data before starting your campaign. The results for the postal column look as the following: 44

51 Identifying data anomalies The result sets for the postal column give the count of the records that match and those that do not match a standard US zip code format. The results sets also give the blank and duplicate counts and the number of most frequent records for each distinct pattern. These results show that the data is not very consistent. Then some percentage of the customers can not be contacted by either or US mail service. These results show clearly that your data is not very consistent and that it needs to be corrected. 3. How to view analyzed data After running the column analysis using the SQL engine and from the Analysis Results view of the analysis editor, you can right-click any of the rows/bars in the result tables/charts and access a view of the actual analyzed data. This could be very helpful to see invalid rows for example and start analyzing what needs to be done to clean such data. To view and export the analyzed data, do the following: At the bottom of the analysis editor, click the Analysis Results tab to open a detailed view of the analysis results. 45

52 Identifying data anomalies Right-click a data row in the statistic results of the column and select View invalid rows for example. The Data Explorer perspective opens listing the invalid rows in the column. 46

53 Chapter 4. Building a simple MDM project This chapter takes you through the main steps involved in building a simple MDM project.

54 Preparing your project in the Studio 4. Preparing your project in the Studio This short scenario walks you through the main steps involved in setting up a simple MDM project. In this example, you recreate some of the content included in the MDM Demo Project. If you already imported the MDM Demo Project, you will have conflicts with the name of the Data Model and other elements, so make sure you create a new project from scratch. Firstly, in Talend Studio, you define the data model and data container, and you then set up a view that you can use to interact with the data they contain using Talend MDM Web User Interface. More complex actions such as the creation of processes and triggers are beyond the scope of this scenario. Before you begin this scenario, make sure you have a valid connection to an MDM Server and have created an empty project. 4. Setting up a data model and creating some business entities The first step at the beginning of any MDM project involves setting up a data model and creating business entities in this data model. 4. Create a data model To create a data model, do the following: In the MDM Repository tree view, right-click Data Model and select New from the contextual menu. Name your data model Product, and then click OK. By default, the Create the corresponding Data Container at the same time check box is selected, so that the corresponding data container with the same name will be created. If needed, you can clear this check box and create the corresponding data container with the same name later. For more information, see Defining a data container. A data container and its corresponding data model must have the same name. In the Studio workspace, an editor opens where you can define some of the details of your new data model. 48

55 Setting up a data model and creating some business entities 4. Create business entities in the data model Once you have created your data model, you need to populate it with some business entities. To create a business entity in your data model, do the following: In the editor, right-click anywhere in the Data Model Entities panel, and then click New Entity. In the [New Entity] dialog box that opens, enter a name for your new entity in the Name field: Product. 3. Select the Complex Type option. You use the Simple type option if you want to define a single element type such as a phone number or an address, and the Complex type option if you want to define a more complete structure, such as an address or, in this case, the different attributes that describe a product. 49

56 Setting up a data model and creating some business entities 4. Leave the other options unchanged and click OK to add your new entity to the editor. The created business entity is listed in the Data Model Entities panel with a by default record, which takes its name from the entity name with the suffix Id, and the complex type, if any, is displayed in the Data Model Types panel. 50

57 Setting up a data model and creating some business entities Each time you create a new business entity, a default Primary Key record, which takes its name from the entity name with the suffix Id, and a Unique Key record which has the same name as the Entity are automatically created. For example, if you create a new business entity and name it Agency, the Primary Key record AgencyId will be created automatically. A Primary Key can be an integer but a Foreign Key must always be a string. The server surrounds Foreign Keys with square brackets to support compound keys Define attributes The next step in this scenario involves defining different attributes for the Product entity you have just created: its name, description and price. To define attributes for the Product business entity, do the following: Expand the Product business entity and anonymous type, right-click the default Primary Key record and then click Edit Element in the contextual menu. Change the name to Id, set the minimum and maximum occurrences to 1, and then click OK to close the dialog box. 3. Right-click Id, click Add Element (after) in the contextual menu, and then add each of the following elements with the characteristics shown in the table below. 4. Element type Element name Minimum occurence Maximum occurence String Name 1 1 Decimal Price 1 1 Save your changes. A [Validation Result Dialog] dialog box opens to show the validation result. 5. In the MDM Repository tree view, expand the Data Model node, right-click the Product data model, and then select one of the deployment options to deploy your changes to the MDM Server. 51

SOS Online Backup USER MANUAL HOW TO INSTALL THE SOFTWARE 1. Download the software from the website: http://www.sosonlinebackup.com/download_the_software.htm 2. Click Run to install when promoted, or alternatively,

BID2WIN Workshop Advanced Report Writing Please Note: Please feel free to take this workbook home with you! Electronic copies of all lab documentation are available for download at http://www.bid2win.com/userconf/2011/labs/

1 2 3 4 Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5. It replaces the previous tools Database Manager GUI and SQL Studio from SAP MaxDB version 7.7 onwards

Archive Manager Publication Date: November, 2015 All Rights Reserved. This software is protected by copyright law and international treaties. Unauthorized reproduction or distribution of this software,

PTC Integrity Eclipse and IBM Rational Development Platform Guide The PTC Integrity integration with Eclipse Platform and the IBM Rational Software Development Platform series allows you to access Integrity

BIGPOND ONLINE STORAGE USER GUIDE Issue 1.1.0-18 August 2005 PLEASE NOTE: The contents of this publication, and any associated documentation provided to you, must not be disclosed to any third party without

Legal Notes Unauthorized reproduction of all or part of this guide is prohibited. The information in this guide is subject to change without notice. We cannot be held liable for any problems arising from

USING STUFFIT DELUXE StuffIt Deluxe provides many ways for you to create zipped file or archives. The benefit of using the New Archive Wizard is that it provides a way to access some of the more powerful

Application Version 3.7.5 Confidentiality This document contains confidential material that is proprietary WatchDox. The information and ideas herein may not be disclosed to any unauthorized individuals

GFI MailArchiver for Exchange 4 Manual By GFI Software http://www.gfi.com Email: info@gfi.com Information in this document is subject to change without notice. Companies, names, and data used in examples

Setting up the Oracle Warehouse Builder Project Purpose In this tutorial, you setup and configure the project environment for Oracle Warehouse Builder 10g Release 2. You create a Warehouse Builder repository

IBM DB2 Universal Database Business Intelligence Tutorial Version 7 IBM DB2 Universal Database Business Intelligence Tutorial Version 7 Before using this information and the product it supports, be sure

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link: ftp://ftp.software.ibm.com/storage/tivoli-storagemanagement/maintenance/client/v6r2/windows/x32/v623/

QUANTIFY INSTALLATION GUIDE Thank you for putting your trust in Avontus! This guide reviews the process of installing Quantify software. For Quantify system requirement information, please refer to the

Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

NETWORK PRINT MONITOR User Guide Legal Notes Unauthorized reproduction of all or part of this guide is prohibited. The information in this guide is subject to change without notice. We cannot be held liable

Setting Up ALERE with Client/Server Data TIW Technology, Inc. November 2014 ALERE is a registered trademark of TIW Technology, Inc. The following are registered trademarks or trademarks: FoxPro, SQL Server,

email-lead Grabber Business 2010 User Guide Copyright and Trademark Information in this documentation is subject to change without notice. The software described in this manual is furnished under a license

After you have installed Unified Intelligent Contact Management (Unified ICM) and have it running, use the to view and update the configuration information in the Unified ICM database. The configuration

Allworx OfficeSafe Operations Guide Release 6.0 No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopy,

Writer Guide Chapter 15 Using Forms in Writer Copyright This document is Copyright 2005 2008 by its contributors as listed in the section titled Authors. You may distribute it and/or modify it under the

Lepide Software LepideAuditor for File Server [CONFIGURATION GUIDE] This guide informs How to configure settings for first time usage of the software Lepide Software Private Limited, All Rights Reserved

SQL Server Integration Services (SSIS) is a set of tools that let you transfer data to and from SQL Server 2005. In this lab, you ll work with the SQL Server Business Intelligence Development Studio to

Scribe Online Integration Services (IS) Tutorial 7/6/2015 Important Notice No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, photocopying,

Installing GFI MailArchiver Introduction This chapter highlights important points you should take into consideration before installing GFI MailArchiver on your network, so that you can make the best decisions

About This Tutorial 1Creating an End-to-End HL7 Over MLLP Application 1.1 About This Tutorial 1.1.1 Tutorial Requirements 1.1.2 Provided Files This tutorial takes you through the steps of creating an end-to-end

Writer Guide Chapter 15 Using Forms in Writer OpenOffice.org Copyright This document is Copyright 2005 2006 by its contributors as listed in the section titled Authors. You can distribute it and/or modify

User Manual Onsight Management Suite Version 5.1 Another Innovation by Librestream Doc #: 400075-06 May 2012 Information in this document is subject to change without notice. Reproduction in any manner

Configuration Guide McAfee VirusScan Enterprise for Linux 1.7.0 Software For use with epolicy Orchestrator 4.5.0 and 4.6.0 COPYRIGHT Copyright 2011 McAfee, Inc. All Rights Reserved. No part of this publication

IBM Operational Decision Manager Version 8 Release 5 Getting Started with Business Rules Note Before using this information and the product it supports, read the information in Notices on page 43. This

Installation and Operation Manual version version About this document This document is intended as a guide for installation, maintenance and troubleshooting of Portable Device Manager (PDM) and is relevant

Integrated Point of Sales System for Mac OS X Program version: 6.3.22 110401 2012 HansaWorld Ireland Limited, Dublin, Ireland Preface Standard POS is a powerful point of sales system for small shops and

F9 Integration Manager User Guide for use with QuickBooks This guide outlines the integration steps and processes supported for the purposes of financial reporting with F9 Professional and F9 Integration

Quick Start Guide DocuSign Retrieve 3.2.2 Published April 2015 Overview DocuSign Retrieve is a windows-based tool that "retrieves" envelopes, documents, and data from DocuSign for use in external systems.

Version 4.61 or Later Copyright 2013 Interactive Financial Solutions, Inc. All Rights Reserved. ProviderPro Network Administration Guide. This manual, as well as the software described in it, is furnished

For Mac OS X Software version 4.1.7 Version 2.2 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by other means.

Access(ing) A Database Project PRESENTED BY THE TECHNOLOGY TRAINERS OF THE MONROE COUNTY LIBRARY SYSTEM EMAIL: TRAININGLAB@MONROE.LIB.MI.US MONROE COUNTY LIBRARY SYSTEM 734-241-5770 1 840 SOUTH ROESSLER

To ensure the functioning of the site, we use cookies. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy &amp Terms.
Your consent to our cookies if you continue to use this website.