Chapter 2: Getting Started with the Hub Console
About the Hub Console .................................................................................................................................18 Starting the Hub Console...............................................................................................................................19

Chapter 7: State Management
Before You Begin ..........................................................................................................................................206 About State Management in Siperian Hub................................................................................................206 About System States............................................................................................................................206 About the Hub State Indicator..........................................................................................................207 Protecting Pending Records Using the Interaction ID .................................................................208 State Transition Rules for State Management .................................................................................208 Hub States and Base Object Record Value Survivorship..............................................................211 Configuring State Management for Base Objects ....................................................................................211 Enabling State Management ..............................................................................................................211 Enabling the History of Cross-Reference Promotion ...................................................................213

Preface
Welcome to the Siperian Hub™ Administrator Guide. This guide explains how to administer, manage, and configure Siperian Hub.

Intended Audience
This guide is intended for Siperian Hub administrators. These are the IT people responsible for configuring or updating a Hub Store so that it provides the rules and functionality required by the data stewards. Administrators should have an excellent knowledge of database administration.

Organization
This guide contains the following chapters:
Part 1, “Introduction” Chapter 1, “Introduction” Chapter 2, “Getting Started with the Hub Console” Part 2, “Building the Data Model” Provides an overview of Siperian Hub administration and explains how to navigate the Hub Console. Introduces Siperian Hub administration phases, tools, and tasks. Introduces tools in the Hub Console and provides general navigation instructions. Describes how to construct the schema (data model) used in your Siperian Hub implementation and stored in the Hub Store. It provides instructions on using Hub Console tools to configure Operational Record Stores (ORSs), datasources, the data model, queries, packages, hierarchies, and other objects.

xxv

Organization

Chapter 3, “About the Hub Store”

Describes the key components of the Hub Store: the Master Database and Operational Record Stores (ORS).

Chapter 4, “Configuring Explains how to configure Operational Record Stores (ORS) Operational Record Stores and and datasources. Datasources” Chapter 5, “Building the Schema” Chapter 6, “Configuring Queries and Packages” Chapter 7, “State Management” Chapter 8, “Configuring Hierarchies” Part 3, “Configuring the Data Flow” Describes the Hub Store schema and provides instructions on building the schema for your Siperian Hub implementation. Explains how to use and create Siperian Hub queries and packages. Describes state management concepts and provides instructions for configuring state management in your Siperian Hub implementation. Explains how to configure Siperian Hierarchy Manager (HM) and describes how to create and configure relationships based on foreign keys. Describes the flow of data through the Siperian Hub via a series of processes (land, stage, load, match, consolidate, and distribute), and provides instructions for configuring each process using tools in the Hub Console. Describes the flow of data through the Siperian Hub via batch processes, starting with the land process and concluding with the distribution process. Describes the data landing process and explains how to configure source systems and landing tables. Describes the data staging process and explains how to configure staging tables, mappings, and other settings in that affect Stage jobs.

Chapter 12, “Configuring Data Explains how to configure data cleansing rules that are run Cleansing” during Stage jobs. Chapter 13, “Configuring the Load Process” Chapter 14, “Configuring the Match Process” Chapter 15, “Configuring the Consolidate Process” Explains how to use the load process, and how to define trust and validation rules. Explains how to configure your Hub Store to match data. Explains how to configure your Hub Store to consolidate data.

Explains how to configure Siperian Hub to write changes to a message queue. Describes how to use Hub Console tools to run Siperian Hub processes via batch jobs, and how to use third-party job management tools to schedule and manage Siperian Hub processes via stored procedures. Explains how to use the Siperian Hub batch jobs and batch groups.

Chapter 17, “Using Batch Jobs”

Chapter 18, “Writing Custom Explains how to schedule Siperian Hub batch jobs using job Scripts to Execute Batch Jobs” execution scripts. Part 5, “Configuring Application Access” Chapter 19, “Generating ORS-specific APIs and Message Schemas” Chapter 20, “Setting Up Security” Chapter 21, “Viewing Registered Custom Code” Chapter 22, “Auditing Siperian Hub Services and Events” Part 6, “Appendixes” Appendix A, “Configuring International Data Support” Appendix B, “Backing Up and Restoring Siperian Hub” Appendix C, “Configuring User Exits” Describes how to use Hub Console tools to configure Siperian Hub client applications that access Siperian Hub using Services Integration Framework (SIF) requests. Describes how to generate ORS-specific SIF APIs using the SIF Manager tool in the Hub Console. Explains how to set up security for users who will access Siperian Hub resources via the Hub Console or third-party applications. Explains how to register custom code using the User Object Registry tool in the Hub Console. Describes how to set up auditing and debugging in the Hub Console. Describes other administration-related topics. Describes how to configure different character sets for internationalization purposes. Explains how to back up and restore a Siperian Hub implementation. Explains how to configure user exits, which are user-customized, unencrypted stored procedures that are configured to execute at a specific point during batch job execution. Explains how to view details of your Siperian Hub implementation using the Enterprise Manager tool in the Hub Console.

Appendix D, “Viewing Configuration Details”

xxvii

Learning About Siperian Hub

Appendix E, “Implementing Custom Buttons in Hub Console Tools”

Explains how to add custom buttons to tools in the Hub Console that allow users to invoke external services on demand.

Siperian Hub Overview
The Siperian Hub Overview introduces Siperian Hub, describes the product architecture, and explains core concepts that all users need to understand before using the product.

Siperian Hub Installation Guide
The Siperian Hub Installation Guide explains to installers how to set up Siperian Hub, the Hub Store, Cleanse Match Servers, and other components. There is a Siperian Hub Installation Guide for each supported platform.

Siperian Hub Resource Kit Guide
The Siperian Hub Resource Kit Guide explains how to install and use the Siperian Hub Resource Kit, which is a set of utilities, examples, and libraries that assist developers with integrating the Siperian Hub into their applications and workflows. This

xxix

Learning About Siperian Hub

document provides a description of the various sample applications that are included with the Resource Kit.

Siperian Hub Insight Manager Guide
The Siperian Hub Insight Manager Guide explains how to install, configure, and use the Siperian Hub Insight Manager to generate reporting metadata for the data managed in the Hub Store. It provides a description of how to use this reporting metadata with third-party reporting tools to create reports and metrics for this data.

Siperian Training and Materials
Siperian provides live, instructor-based training to help professionals become proficient users as quickly as possible. From initial installation onward, a dedicated team of qualified trainers ensure that an organization’s staff is equipped to take advantage of this powerful platform. To inquire about training classes or to find out where and when the next training session is offered, please visit Siperian’s web site or contact Siperian directly.

xxx Siperian Hub Administrator Guide

Contacting Siperian

Contacting Siperian
Technical support is available to answer your questions and to help you with any problems encountered using Siperian products. Please contact your local Siperian representative or distributor as specified in your support agreement. If you have a current Siperian Support Agreement, you can contact Siperian Technical Support:
Method World Wide Web Email Voice Contact Information http://www.siperian.com support@siperian.com U.S.: 1-866-SIPERIAN (747-3742)

1
Introduction
This chapter introduces and provides an overview of administering Siperian MDM Hub™ (hereinafter referred to as Siperian Hub). It is recommended for anyone who manages a Siperian Hub implementation. Note: This document assumes that you have read the Siperian Hub Overview and have a basic understanding of Siperian Hub architecture and key concepts.

Hub Installation Guide for your platform. For instructions on setting up a cleanse adapter, see the Siperian Hub Cleanse Adapter Guide. Note: The instructions in this document assume that you have already completed the startup phase and are ready to begin configuring your Siperian Hub implementation.

Configuration Phase
After Siperian Hub has been installed and set up, administrators can begin configuring and testing Siperian Hub functionality—the data model and other objects in the Hub Store, data management processes, external application access, and so on. This phase involves a dynamic, iterative process of building and testing Siperian Hub functionality to meet the stated requirements of an organization. The bulk of the material in this document refers to tasks associated with the configuration phase. After a schema has been sufficiently built and the Siperian Hub has been properly configured, developers can build external applications to access Siperian Hub functionality and resources. For instructions on developing external applications, see the Siperian Services Integration Framework Guide.

Production Phase
After a Siperian Hub implementation has been sufficiently configured and tested, administrators deploy the Siperian Hub in a production environment. In addition to managing ongoing Siperian Hub operations, this phase can involve performance tuning to optimize the processing of actual business data.

Introduction 5

Summary of Administration Tasks

Summary of Administration Tasks
This section provides a summary of administration tasks.

Setting Up Security
In this document, Chapter 20, “Setting Up Security,” describes the tasks associated with setting up security in a Siperian Hub implementation. Setup tasks vary depending on the particular security requirements of your Siperian Hub implementation, as described in “Security Implementation Scenarios” on page 836. Additional security tasks are involved if external applications access your Siperian Hub implementation using Services Integration Framework (SIF) requests. For more information, see “About Setting Up Security” on page 832, “Summary of Security Configuration Tasks” on page 838, and “Configuration Tasks For Security Scenarios” on page 839. To configure security for a Siperian Hub implementation using Siperian Hub’s internal security framework, you complete the following tasks using tools in the Hub Console:
High-Level Tasks for Setting Up Security

Usage Required if you are using external security providers to handle any portion of security in your Siperian Hub implementation. Required to provide non-administrator users with access to Hub Console tools.

Building the Data Model
In this document, Part 2, “Building the Data Model,” describes how to construct the schema (data model) used in your Siperian Hub implementation and stored in the Hub Store. It provides instructions for using Hub Console tools to configure Operational Record Stores (ORSs), datasources, the data model, queries, packages, hierarchies, and other metadata.
High-Level Tasks for Building the Data Model

Usage Required for all Siperian Hub implementations. For more information, see the instructions for installing the Hub Store in the Siperian Hub Installation Guide for your platform. Required for all Siperian Hub implementations. You must register an ORS so that Siperian Hub can connect to it. For more information, see “Databases in the Hub Store” on page 56. Required only if the datasource was not automatically created upon registering an ORS. Every ORS requires a datasource definition in the application server environment. For more information, see “About Datasources” on page 77. Required for each base object in your schema. Base objects are used for a central business entity (such as customer, product, or employee) or a lookup table (such as country or state). For more information, see “About the Schema” on page 82, “Process Overview for Defining Base Objects” on page 94, and “About Base Objects” on page 92.

Usage Required for all base objects, dependent objects, landing tables, and staging tables. For more information, see “About Columns” on page 126. Required only when you want to explicitly define a foreign-key relationship (parent-child) between two base objects. For more information, see “Process Overview for Defining Foreign-Key Relationships” on page 143 and “About Foreign Key Relationships” on page 140. For Hierarchy Manager, see “Configuring Hierarchies” on page 223 instead. Required only if a base object has a dependent object, which is a table that is used to store detailed information about the records in a base object (such as supplemental notes). For more information, see “About the Schema” on page 82, “Process Overview for Defining Dependent Objects” on page 119, and “About Dependent Objects” on page 117. Useful for visualizing your schema in a graphical format. Required for creating queries used in packages. For more information, see “About Queries” on page 162 and “Configuring Packages” on page 196. Required for queries used by data stewards in the Merge Manager tool. For more information, see the Siperian Hub Data Steward Guide.

“Configuring Dependent Objects” on page 117

“Viewing Your Schema” on page 148 “Configuring Queries” on page 162

“Configuring Packages” on page Required to allow external application users to access 196 Siperian Hub functionality using Services Integration Framework (SIF) requests. For more information, see the Siperian Services Integration Framework Guide. For more information, see “About Packages” on page 196. Required to allow data stewards to merge and update records in the Hub Store using the Merge Manager and Data Manager tools. For more information, see the Siperian Hub Data Steward Guide.

8 Siperian Hub Administrator Guide

Summary of Administration Tasks

Configuring the Data Flow
In this document, Part 3, “Configuring the Data Flow,” describes the flow of data through the Siperian Hub through a series of processes (land, stage, load, match, consolidate, and publish), and provides instructions for configuring each process using tools in the Hub Console.

Configuring the Land Process
To configure the land process for a base object, see “Land Process” on page 292, “Configuring the Land Process” on page 347, and the following topics:
High-Level Tasks for Configuring the Land Process

Task “Configuring Source Systems” on page 348

Usage Required to define a unique name internal name for each source system (external applications or systems that provide data to Siperian Hub). For more information, see “About Source Systems” on page 348. Required to create landing tables, which provide intermediate storage in the flow of data from source systems into Siperian Hub. For more information, see “About Landing Tables” on page 355.

“Configuring Landing Tables” on page 355

Configuring the Stage Process
To configure the stage process for a base object, see “Stage Process” on page 295, “Configuring the Stage Process” on page 363, and the following topics:
High-Level Tasks for Configuring the Stage Process

“Mapping Columns Between Required to enable Siperian Hub to move data from a Landing and Staging Tables” on landing table to a staging table during the stage process, and page 380 also to specify cleanse operations on columns of data that are moved. To learn more, see “About Mapping Columns” on page 380. “Configuring Data Cleansing” on page 405 Required to set up data cleansing for a base object during the stage process using the Siperian Hub internal cleanse functionality. To learn more, see “About Data Cleansing in Siperian Hub” on page 406 and the following topics: • “Configuring Cleanse Match Servers” on page 407 to deploy Cleanse Match Servers that execute cleanse operations and the match process for an Operational Record Store (ORS). For more information, see “About the Cleanse Match Server” on page 407. “Configuring Cleanse Lists” on page 440 to specify a logical grouping of cleanse functions that are executed at run time in a predefined order. For more information, see “About Cleanse Lists” on page 440. “Using Cleanse Functions” on page 414 to build and execute cleanse functions that cleanse (standardize or verify) data. For more information, see “About Cleanse Functions” on page 414.

•

•

Configuring the Load Process
To configure the load process for a base object, see “Load Process” on page 299, “Configuring the Load Process” on page 453, and the following topics:
High-Level Tasks for Configuring the Load Process

Task “Configuring Trust for Source Systems” on page 455

Usage Used when multiple source systems contribute data to a column in a base object. Required if you want to designate the relative trust level (confidence factor) for each contributing source system. For more information, see “About Trust” on page 455. Required if you want to use validation rules to downgrade trust scores for cell data based on configured conditions. For more information, see “About Validation Rules” on page 468.

“Configuring Validation Rules” on page 468

10 Siperian Hub Administrator Guide

Summary of Administration Tasks

Configuring the Match Process
To configure the match process for a base object, see “Match Process” on page 317, “Configuring the Match Process” on page 483, and the following topics:
High-Level Tasks for Configuring the Match Process

Usage Required for each base object that will be involved in mapping. For more information, see “Match Properties” on page 490. Required for match column rules involving related records in either separate tables or in the same table. For more information, see “About Match Paths” on page 497. Required to specify the base object columns to use in match column rules. For more information, see “About Match Columns” on page 515. Required if you want to use match rule sets to execute different sets of match column rules at different stages in the match process. For more information, see “About Match Rule Sets” on page 531. Required to specify match column rules that determine whether two records for a base object are similar enough to consolidate. For more information, see “About Match Column Rules” on page 542. Required to specify the base object columns (primary keys) to use in primary key match rules. For more information, see “About Primary Key Match Rules” on page 578. Useful for investigating the distribution of generated match keys upon completion of the match process. For more information, see “About Match Keys Distribution” on page 583.

Configuring the Consolidation Process
To configure the consolidation process for a base object, see “Consolidate Process” on page 335 and “Configuring the Consolidate Process” on page 593.

Configuring the Publish Process
To configure the publish process for a base object, see “Publish Process” on page 342, “Configuring the Publish Process” on page 601, and the following topics:
High-Level Tasks for Configuring the Publish Process

Usage Required to specify global settings for all message queues involving outbound Siperian Hub messages. Required to set up one or more message queue servers that Siperian Hub will use for incoming and outgoing messages. The message queue server must already be defined in your application server environment according to the application server instructions. For more information, see “About Message Queue Servers” on page 605. Required to set up one or more outbound message queues for a message queue server. For more information, see “About Message Queues” on page 608.

Executing Siperian Hub Processes
In this document, Part 4, “Executing Siperian Hub Processes,” describes how to use Hub Console tools to run Siperian Hub processes, either: • as batch jobs from the Hub Console, or • as stored procedures using third-party job management tools to schedule and manage job execution

Executing Processes in the Hub Console
To execute Siperian Hub processes using tools in the Hub Console, see “About Siperian Hub Batch Jobs” on page 668, “Using Batch Jobs” on page 667, and the following topics:
High-Level Tasks for Executing Siperian Hub Process in the Hub Console

Task “Running Batch Jobs Using the Batch Viewer Tool” on page 674

Usage Required if you want to run individual batch jobs from the Hub Console using the Batch Viewer tool. For more information, see “Batch Viewer Tool” on page 674.

“Running Batch Jobs Using the Required if you want to run batch jobs in a group from the Batch Group Tool” on page 688 Hub Console, allowing you to configure the execution sequence for batch jobs and to execute batch jobs in parallel. For more information, see “About Batch Groups” on page 688.

Configuring Workflow Integration
If your Siperian Hub implementation integrates with a supported workflow engine, you need to enable states for base objects and configure other settings. For more information, see “Configuring State Management for Base Objects” on page 211.

14 Siperian Hub Administrator Guide

Summary of Administration Tasks

Other Administration Tasks
In this document, Part 5, “Configuring Application Access,” and Part 6, “Appendixes,” provide additional information about administration-related topics.
Other High-Level Administration Tasks

“Auditing Siperian Hub Services Used for integration auditing to track activities associated and Events” on page 919 with the exchange of data between Siperian Hub and external systems. For more information, see “About Integration Auditing” on page 920. “Backing Up and Restoring Siperian Hub” on page 951 Used for backing up and restoring a Siperian Hub implementation.

“Configuring International Data Required only to configure different character sets in a Support” on page 939 Siperian Hub implementation. “Configuring User Exits” on page 955 “Viewing Configuration Details” on page 967 Required only if user exits are used. For more information, see “About User Exits” on page 956. Used for remotely monitoring a Siperian Hub environment, showing configuration settings for the Hub Server, Cleanse Match Servers, Master Database, and Operational Record Stores.

“Implementing Custom Buttons Used only if you want to create custom buttons for Hub in Hub Console Tools” on page Console users to provide on-demand, real-time access to 977 specialized data services. Applies only to the Merge Manager, Data Manager, and Hierarchy Manager tools.

Introduction 15

Summary of Administration Tasks

16 Siperian Hub Administrator Guide

2
Getting Started with the Hub Console
This chapter introduces the Hub Console and provides a high-level overview of the tools involved in configuring your Siperian Hub implementation.

About the Hub Console
Administrators and data stewards can access Siperian Hub features via the Siperian Hub user interface, which is called the Hub Console. The Hub Console comprises a set of tools. Each tool allows you to perform a specific action, or a set of related actions.

Starting the Hub Console
To access the Hub Console: 1. Open a browser window and enter the following URL:
http://YourHubHost:port/cmx/

where YourHubHost is your local Siperian Hub host and port is the port number. Check with your administrator for the correct port number. Note: You must use an HTTP connection to start the Hub Console. SSL connections are not supported. The Siperian Hub launch screen is displayed.

2.

Click the Launch button.

Getting Started with the Hub Console

19

Starting the Hub Console

The first time (only) that you launch Hub Console from a client machine, Java Web Start downloads application files and displays a progress bar.

The Siperian Hub Login dialog box is displayed.

3.

Enter your user name and password. Note: If you do not have any user names set up, contact Siperian support. Click OK.

4.

20 Siperian Hub Administrator Guide

Starting the Hub Console

After you have logged in with a valid user name and password, Siperian Hub will prompt you to choose a target database—the Master Database or an Operational Record Store(ORS) with which to work.

The list of databases to which you can connect is determined by your security profile. • The Master Database stores Siperian Hub environment configuration settings—user accounts, security configuration, ORS registry, message queue settings, and so on. A given Siperian Hub environment can have only one Master Database. An Operational Record Store (ORS) stores the rules for processing the master data, the rules for managing the set of master data objects, along with the processing rules and auxiliary logic used by the Siperian Hub in defining the best version of the truth (BVT). A Siperian Hub configuration can have one or more ORS databases.

•

Getting Started with the Hub Console

21

Starting the Hub Console

Throughout the Hub Console, an icon next to an ORS indicates whether it has been validated and, if so, whether the most recent validation resulted in issues.
Image Meaning Unknown. ORS has not been validated since it was initially created, or since the last time it was updated. ORS has been validated with no issues. No change has been made to the ORS since the validation process was made. ORS has been validated with warnings. ORS has been validated and errors were found.

For more information, see Chapter 3, “About the Hub Store.”
5. 6.

Select the Master Database or the ORS to which you want to connect. Click Connect. Note: You can easily change the target database once inside the Hub Console, as described in “Changing the Target Database” on page 31.

22 Siperian Hub Administrator Guide

Starting the Hub Console

The Hub Console screen is displayed, as shown in the following example (in which the Schema Manager is selected from the Model workbench).
Menu

Workbenches/Processes Navigation Tree

Properties Panel

When you select a tool from the Workbenches page or start a process from the Processes page, the window is typically divided into several panes:
Pane Description

Workbenches Displays one of the following: / Processes • List of workbenches and tools to which you have access (as shown in the previous figure). • List of the steps in the process that you are running. Note: The workbenches and tools that you see depends on what your company has purchased, as well as to what your administrator has given you access. If you do not see a particular workbench or tool when you log into the Hub Console, then your user account has not been assigned permission to access it. Navigation Tree Allows you to navigate items (a list of objects) in the current tool. For example, in the Schema Manager, the middle pane contains a list of schema objects (base objects, landing tables, and so on).

Getting Started with the Hub Console

23

Navigating the Hub Console

Pane Properties Panel

Description Shows details (properties) for the selected item in the navigation tree, and possibly other panels if available in the current tool. Some of the properties might be editable.

Navigating the Hub Console
This section describes how to navigate the Hub Console interface. Hub Console is a collection of tools that you use to configure and manage your Siperian Hub implementation (see “Siperian Hub Workbenches and Tools” on page 48 for a complete list). Each tool allows you to focus on a particular area of your Siperian Hub implementation.

Toggling Between the Processes and Workbenches Views
Siperian Hub groups its tools in two different ways:
Pane By Workbenches By Process Description Similar tools are grouped together by workbench—a logical collection of related tools. Tools are grouped into a logical workflow that walks you through the tools and steps required for completing a task.

You can click the tabs at the left-most side of the Hub Console window to toggle between the Processes and Workbenches views. Note: When you log into Siperian Hub, you see only those workbenches and processes that contain the tools that your Siperian Hub security administrator has authorized you to use. The screen shots in this document show the full set of workbenches, processes, and tools available.

24 Siperian Hub Administrator Guide

Navigating the Hub Console

Workbenches View
To view tools by workbench: • Click the Workbenches tab on the left side of the page. Hub Console displays a list of available workbenches on the Workbenches tab. The Workbenches view organizes Hub Console tools by similar functionality, as shown in the following example.

Utilities Workbench

Tools in the Utilities Workbench

The workbench names and tool descriptions are metadata-driven, as is the way in which tools are grouped. It is possible to have customized tool groupings. Therefore, the arrangement of tools and workbenches that you see after you log in to Hub Console might differ somewhat from the previous figure.

Getting Started with the Hub Console

25

Navigating the Hub Console

Processes View
To view tools by process: • Click the Processes tab on the left side of the page. Hub Console displays a list of available processes on the Processes tab. Tools are organized into common sequences or processes, as shown in the following example.

Available Processes

Processes step you through a logical sequence of tools to complete a specific task. The same tool can belong to several processes, and can appear many times in one process.

26 Siperian Hub Administrator Guide

Navigating the Hub Console

Starting a Tool in the Workbenches View
To start a Hub Console tool from the Workbenches view: 1. In the Workbenches view, expand the workbench that contains the tool that you want to start (see “Siperian Hub Workbenches and Tools” on page 48).
2.

If necessary, expand the workbench node to show the tools associated with that workbench. Click the tool. If you selected a tool that requires a different database, the Hub Console prompts you to select it.

3.

All tools in the Configuration workbench (Databases, Users, Security Providers, Tool Access, Message Queues, Metadata Manager, and Enterprise Manager) require a connection to the master database. All other tools require a connection to an ORS. The Hub Console displays the tool that you selected.

Getting Started with the Hub Console

27

Navigating the Hub Console

Acquiring Locks to Change Settings in the Hub Console
In the Hub Console, a lock is required to make changes to the underlying schema. All non-data steward tools (except the ORS security tools) are in read-only mode unless you acquire a lock. Hub Console locking allows multiple users to make changes to the Siperian Hub schema at the same time.

Types of Locks
In the Hub Console, the Write Lock menu provides two types of locks:
Type of Lock Description exclusive lock Allows only one user to make changes to the underlying ORS, preventing any other users from changing the ORS while the exclusive lock is in effect. For more information, see “Acquiring an Exclusive Lock” on page 30. Allows multiple users to making changes to the underlying metadata at the same time. Write locks can be obtained on the Master Database or on an ORS. For more information, see “Acquiring a Write Lock” on page 30.

write lock

Note: Locks cannot be obtained on an ORS that is in production mode. If an ORS is in production mode and you attempt to obtain a write lock, you will see a message stating that you cannot acquire the lock. For more information, see “Editing ORS Properties” on page 69.

Note: The data steward tools—Data Manager, Merge Manager, and Hierarchy Manager—do not require write locks. For more information about these tools, see the Siperian Hub Data Steward Guide. The Audit Manager does not require write locks, either.

Automatic Lock Expiration
The Hub Console takes care of refreshing the lock every 60 seconds on the current connection. The user can manually release a lock according to the instructions in “Releasing a Lock” on page 30. If a user switches to a different database while holding a lock, then the lock is automatically released. If the Hub Console is terminated, then the lock expires after one minute.

Server Caching and Hub Console Locks
When no locks are in effect in the Hub Console, the Hub Server caches metadata and other configuration settings for performance reasons. As soon as a Hub Console user acquires a write lock or exclusive lock, caching is disabled, the cache is emptied, and Siperian Hub retrieves this information from the database instead. When all locks are released, caching is enabled again.

Getting Started with the Hub Console

29

Navigating the Hub Console

Acquiring a Write Lock
Write locks allow multiple users to edit data in the Hub Console at the same time. However, write locks do not prevent those users from editing the same data at the time time. In such cases, the most recently-saved changes prevail. To acquire a write lock in Hub Console: 1. From the Write Lock menu, choose Acquire Lock. • • •
2.

If the lock has already been acquired by someone else, then the login name and machine address of that person is displayed. If the ORS in production mode, then a message is displayed explaining that you cannot acquire the lock. If the lock is acquired successfully, then the tools are in read-write mode. Multiple users can have a write lock per ORS or in the Master Database.

When you are finished, you can explicitly release the write lock according to the instructions in “Releasing a Lock” on page 30.

Acquiring an Exclusive Lock
To acquire an exclusive lock in Hub Console: 1. From the Write Lock menu, choose Clear Lock to clear any write locks held by other users, as described in “Clearing Locks” on page 31.
2.

From the Write Lock menu, choose Acquire Exclusive Lock. If the ORS is in production mode, then a message is displayed explaining that you cannot acquire the exclusive lock.

3.

When you are finished making changes, release the exclusive lock, as described in “Releasing a Lock” on page 30.

Releasing a Lock
To release a lock in Hub Console: • From the Write Lock menu, choose Release Lock.

30 Siperian Hub Administrator Guide

Navigating the Hub Console

Clearing Locks
You can force the release of any locks—write or exclusive locks—held by other users. You might want to do this, for example, to obtain an exclusive lock on the ORS. Because other users are not warned to save changes before their write locks are released, you should use this only when necessary. To clear all locks: • From the Write Lock menu, choose Clear Lock. Hub Console releases any locks on the ORS.

Changing the Target Database
The status bar at the bottom of the Hub Console window always shows: • the name of the target database to which you connected • the user name you used to log in

To change the target database in the Hub Console, do one of the following. 1. On the status bar, click the database name.

Getting Started with the Hub Console

31

Navigating the Hub Console

Hub Console prompts you to choose a target database with which to work.

For a description of the types of databases that you can select, see “Starting the Hub Console” on page 19.
2. 3.

Select the Master Database or the ORS to which you want to connect. Click Connect.

Logging in as a Different User
To log in as a different user in the Hub Console: 1. Click the user name on the status bar.
2. 3.

From the Options menu, choose Re-Login As.... Specify the user name and password for the user account that you want to use.

Changing the Password for a User
To change the password for the currently logged-in user in the Hub Console: 1. From the Options menu, choose Change Password.
2. 3.

Specify the password that you want to use instead. Click OK.

32 Siperian Hub Administrator Guide

Navigating the Hub Console

Using the Navigation Tree in the Navigation Pane
The navigation tree in the Hub Console allows you to view and manage a hierarchical collection of objects. This section uses the Schema Manager as an example, but the functionality described in this section also applies to using the navigation tree for the following Hub Console tools: Message Queues, Mappings, Queries, Packages, Schema, Users and Groups, and the Batch Viewer.

Parent and Child Nodes
Each named object is represented as a node in the hierarchy tree. A node that contains other nodes is called a parent node. A node that belongs to a parent node is called a child node.

Getting Started with the Hub Console

33

Navigating the Hub Console

In the following example in the Schema Manager, the Address base object is the parent node to the associated child nodes (Columns, Cross-Reference, Dependent Objects, and so on).

Parent Node (Address Base Object)

Child Nodes (of Address)

Tree Options

Showing and Hiding Child Nodes
To show child nodes beneath a parent node: • Click the plus (+) sign next to the parent node. To hide child nodes beneath a parent node: • Click the minus (-) sign next to the parent node.

34 Siperian Hub Administrator Guide

Navigating the Hub Console

Sorting by Display Name
The display name is the name of an object as it appears in the navigation tree. You can change the order in which the objects are displayed in the navigation tree by clicking Sort By in the tree options area and selecting the appropriate sort option.

Choose from the following sort options: • Display Name (a-z) sorts the objects in the tree alphabetically according to display name. • Display Name (z-a) sorts the objects in the tree in descending alphabetical order according to display name.

Filtering Items
You can filter the items shown in the navigation tree by clicking the Filter area at the bottom of the left pane and selecting the appropriate filter option. The figures in this section are from the Schema Manager, but the sample principles apply to other Hub Console tools for which filtering is available.

Choose from the following filter options: • No Filter (All Items)—Removes any filter that was previously defined. • One Item—Displays a drop-down list above the navigation tree from which to select an item. In the Schema Manager, for example, you can choose Table type or Table.

Getting Started with the Hub Console

35

Navigating the Hub Console

If you choose Table type, you click the down arrow to display a list of table types from which to select for your filter.

Select a Type

36 Siperian Hub Administrator Guide

Navigating the Hub Console

•

If you choose Table, you click the down arrow to display a list of tables from which to select for your filter.

Select a Table

•

Some Items—Allows you to select one or more items.

Getting Started with the Hub Console

37

Navigating the Hub Console

For example, in the Schema Manager, you can choose tables based on either the table type or table name. When you choose Some Items, the Hub Console displays the Define Item Filter button above the navigation tree.

•

Click the Define Item Filter button.

Select All Items Clear All Selected Items

•

Select the item(s) that you want to include in the filter, and then click OK.

38 Siperian Hub Administrator Guide

Navigating the Hub Console

Note: Use the No Filter (All Items) option to remove the filter.

Changing the Item View
Certain Hub Console tools show a View or View By area below the navigation tree. • In the Schema Manager, you can show or hide the public Siperian Hub items by clicking the View area below the navigation tree and choosing the appropriate command.

For example, you can view all system tables.

• • •

In the Mappings tool, you can view items by mapping, staging table, or landing table. In the Packages tool, you can view items by package or by table. In the Users and Groups tool, you can display sub groups and sub users. In the Batch Viewer, you can group jobs by table, date, or procedure type.

Getting Started with the Hub Console

39

Navigating the Hub Console

Searching For Items
When there is no filter, or when the Some Items filter is selected, Hub Console displays a Find area above the navigation tree so that you can search for items by name. For example, in the Schema Manager, you can search for tables and columns. 1. Click anywhere in the Find area to display the Find window.

2. 3.

Type the name (or first few letters of the name) that you want to find. Click the F3 - Find button.

40 Siperian Hub Administrator Guide

Navigating the Hub Console

The Hub Console highlights the matched item(s). In the following example, the Schema Manager displays the list of tables and highlights the table matches the find criteria:

4.

Click anywhere in the Find area to hide the Find window.

Running Commands On Objects in the Navigation Tree
To run commands on an object in the navigation tree, do one of the following: • Right-click an object name to display a pop-up menu of commands that you can perform on the object. OR • Select an object in the navigation tree, and then choose the command you want from the Hub Console menu at the top of the window.

Getting Started with the Hub Console

41

Navigating the Hub Console

Note: Whenever possible, this document describes the first approach—right-clicking an object in the navigation tree and choosing a command from the pop-up menu. Alternatively, however, you can always choose the command from the Hub Console menu. For example, in the Schema Manager, you can right-click on certain types of objects in the navigation tree to see a popup menu of the commands available for the selected object.

Popup Menu

42 Siperian Hub Administrator Guide

Navigating the Hub Console

Adding, Editing and Removing Objects Using Command Buttons
This section describes generally how you use command buttons to add, edit, and delete objects in the Hub Console.

Command Buttons
If you have access to create, modify, or delete objects in a Hub Console window, and if you have acquired a write lock (“Acquiring a Write Lock” on page 30), you might see some or all of the following command buttons in the Properties panel. There are other command buttons as well.
Button Name Description Add Edit Delete Save Add a new object. Edit a property for the selected item in the Properties panel. Indicates that the property is editable. Remove the selected item. Save changes.

Getting Started with the Hub Console

43

Navigating the Hub Console

The following figure shows an example of command buttons on the right side of the properties panel for the Secure Resources tool.

Command Buttons

To see a description about what a command button does, hold the mouse over the button to display a tooltip, as shown in the following example.

Tooltip

Adding Objects
To add an object: 1. Acquire a write lock.
2.

In the Hub Console tool, click the

Add button.

The Hub Console displays an Add object window, where object is the name of the type of object that you are adding.
3.

In the Hub Console tool, select the object whose properties you want to edit. For each property that you want to edit, click the specify the new value. Click the Save button to save your changes. Edit button next to it, and

Specify the options you want, including: • • General tab: Specify whether to show wizard welcome screens, and whether to save window sizes and positions. Quick Launch tab: Specify tools that you want to appear as icons in a tool bar below the menu, as shown in the following example.
Toolbar

46 Siperian Hub Administrator Guide

Navigating the Hub Console

Showing Version Details
To show version details about the currently-installed Siperian Hub: 1. In the Hub Console, choose Help | About. The Hub Console displays the About Siperian Hub dialog.

2.

Click Installation Details.

Getting Started with the Hub Console

47

Siperian Hub Workbenches and Tools

The Hub Console displays the Installation Details dialog.

3. 4.

Click Close. Click Close.

Siperian Hub Workbenches and Tools
This section provides an overview of the Siperian Hub workbenches and tools.

Tools in the Configuration Workbench
Icon Tool Name Databases Description Register and manage Operational Record Stores (ORSs). To learn more, see Chapter 4, “Configuring Operational Record Stores and Datasources.” Define users and specify which databases they can access. Manage global and individual password policies. Note that Siperian Hub supports external authentication for users, such as LDAP. For more information, see Chapter 20, “Configuring Siperian Hub Users.” Configure security providers, which are third-party organizations that provide security services (authentication, authorization, and user profile services) for users accessing Siperian Hub. For more information, see “Managing Security Providers” on page 889. Define which Hub Console tools and processes a user can access. By default, new user accounts do not have access to any tools until access is explicitly assigned. For more information, see Appendix F, “Configuring Access to Hub Console Tools.”

Enterprise Manager View configuration details and version information for the Hub Server, Cleanse Servers, the Master Database, and Operational Record Stores. For more information, see Appendix D, “Viewing Configuration Details.”

Tools in the Model Workbench
Icon Tool Name Schema Description Define base objects, dependent objects, relationships, history and security requirements, staging and landing tables, validation rules, match criteria, and other data model attributes. To learn more, see Chapter 5, “Building the Schema.” View and navigate the current schema. For more information, see “Viewing Your Schema” on page 148. Name the source systems that can provide data for consolidation in Siperian Hub. Define the trust settings associated with each source system for each base object column. For more information, see “Configuring Source Systems” on page 348 and “Configuring Trust for Source Systems” on page 455. Define query groups and queries used by packages. To learn more, see “Configuring Queries” on page 162. Define packages (table views). To learn more, see “Configuring Packages” on page 196. Define cleanse functions to perform on your data. For more information, see “Using Cleanse Functions” on page 414. Map cleansing function outputs to target columns in staging tables. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380.

Schema Viewer Systems and Trust

Queries Packages Cleanse Functions Mappings

Getting Started with the Hub Console

49

Siperian Hub Workbenches and Tools

Icon

Tool Name Hierarchies

Description Set up the structures required to view and manipulate data relationships in Hierarchy Manager. For more information, see Chapter 8, “Configuring Hierarchies.”

Tools in the Security Access Manager Workbench
Icon Tool Name Secure Resources Description Manage secure resources in Siperian Hub. Configure the status (Private, Secure) for each Siperian Hub resource, and define resource groups to organize secure resources. For more information, see “Securing Siperian Hub Resources” on page 841. Define roles and privilege assignments to resources and resource groups. Assign roles to users and user groups. For more information, see “Configuring Roles” on page 854. Manage the users and user groups within a single Hub Store. To learn more, see Chapter 20, “Setting Up Security.”

Roles

Users and Groups

Tools in the Data Steward Workbench
For more information about these tools, see the Siperian Hub Data Steward Guide.
Icon Tool Name Data Manager Description Manage the content of consolidated data, view cross-references, edit data, view history and unmerge consolidated records.

To learn more, see the Siperian Hub Data Steward Guide.

Merge Manager

Review and merge the matched records that have been queued for manual merging. For more information, see the Siperian

Hub Data Steward Guide.

Hierarchy Manager

Define and manage hierarchical relationships in their Hub Store. For more information, see the Siperian Hub Data Steward Guide.

50 Siperian Hub Administrator Guide

Siperian Hub Workbenches and Tools

Tools in the Utilities Workbench
Icon Tool Name Batch Group Description Configure and run batch groups, which are collections of individual batch jobs (for example, Stage, Load, and Match jobs) that can be executed with a single command. For more information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674. Execute batch jobs to cleanse, load, match or auto-merge data, and view job logs. For more information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674. View Cleanse Match Server information, including name, port, server type, and whether server is on or offline. For more information, see “About the Cleanse Match Server” on page 407. Configure auditing and debugging of application requests and message queue events. For more information, see Chapter 22, “Auditing Siperian Hub Services and Events.” Generate ORS-specific Services Integration Framework (SIF) request APIs. SIF Manager generates and deploys the code to support SIF request APIs for packages, remote packages, mappings, and cleanse functions in an ORS. Once generated, the ORS-Specific APIs are available as a Web service and via the Siperian Client JAR. For more information, see Chapter 19, “Generating ORS-specific APIs and Message Schemas.” View registered user exits, user stored procedures, custom Java cleanse functions, and custom GUI functions for an ORS. For more information, see Chapter 21, “Viewing Registered Custom Code.”

3
About the Hub Store
The Hub Store is where business data is stored and consolidated in Siperian Hub. The Hub Store contains common information about all of the databases that are part of your Siperian Hub implementation.

Databases in the Hub Store
The Hub Store is a collection of databases that includes:
Element Master Database Description Contains the Siperian Hub environment configuration settings—user accounts, security configuration, ORS registry, message queue settings, and so on. A given Siperian Hub environment can have only one Master Database. The default name of the Master Database is CMX_SYSTEM. In the Hub Console, the tools in the Configuration workbench (Databases, Users, Security Providers, Tool Access, and Message Queues) manage configuration settings in the Master Database. Operational Record Store (ORS) Database that contains the master data, content metadata, the rules for processing the master data, the rules for managing the set of master data objects, along with the processing rules and auxiliary logic used by the Siperian Hub in defining the best version of the truth (BVT). A Siperian Hub configuration can have one or more ORS databases. The default name of an ORS is CMX_ORS.

Users for Hub Store databases are created globally—within the Master Database—and then assigned to specific ORSs. The Master Database also stores site-level information, such as the number of incorrect log-in attempts allowed before a user account is locked out.

56 Siperian Hub Administrator Guide

How Hub Store Databases Are Related

How Hub Store Databases Are Related
A Siperian Hub implementation contains one Master Database and zero or more ORSs. If no ORS exists, then only the Configuration workbench tools are available in the Hub Console. A Siperian Hub implementation can have multiple ORSs, such as separate ORSs for development and production, or separate ORSs for each geographical location or for different parts of the organization.

You can access and manage multiple ORSs from one Master Database. The Master Database stores the connection settings and properties for each ORS. Note: An ORS can be registered in only one Master Database. Multiple Master Databases cannot share the same ORS. A single ORS cannot be associated with multiple Master Databases.

About the Hub Store

57

Creating Hub Store Databases

Creating Hub Store Databases
Databases are initially created and configured when you install Siperian Hub. • To create the Master Database and one ORS, you run the setup.sql script. • To create an individual ORS, you run the setup_ors.sql script.

To learn more, see the Siperian Hub Installation Guide for your platform.

Version Requirements
Different versions of the Siperian Hub cannot operate together in the same environment. All components of your installation must be the same version, including the Siperian Hub software and the databases in the Hub Store. If you want to have multiple versions of Siperian Hub at your site, you must install each version in a separate environment. If you try to work with a different version of a database, you will receive a message telling you to upgrade the database to the current version.

58 Siperian Hub Administrator Guide

4
Configuring Operational Record Stores and Datasources
This chapter describes how to configure Operational Record Store (ORS) and datasources for the Hub Store using the Databases tool in the Hub Console.

Before You Begin
Before you begin, you must have installed Siperian Hub, created the Master Database and at least one ORS (running the setup.sql script creates both) according to the instructions in the Siperian Hub Installation Guide for your platform. You can create additional ORSs by running the setup_ors.sql script.

About the Databases Tool
After the Hub Store has been created, you can use the Databases tool in the Hub Console to complete the following tasks: • Register an ORS so that the Master Reference Manager can connect to it. Registration stores the database connection properties in the Master Database. • Define an ORS datasource in the application server environment for Siperian Hub. An ORS datasource contains a set of properties for the ORS, such as the location of the database server, the name of the database, the network protocol used to communicate with the server, the database user ID and password, and so on. Note: The Databases tool refers to an ORS as a database.

60 Siperian Hub Administrator Guide

Starting the Databases Tool

Starting the Databases Tool
To start the Databases tool: 1. In the Hub Console, connect to your Master Database. To learn more, see “Changing the Target Database” on page 31.
2.

Expand the Siperian Configuration workbench and then click Databases. The Hub Console displays the Databases tool, as shown in the following example (in which a registered ORS is selected).

Registered ORSs

ORS Properties

The Databases tool displays the following areas:
Column Number of databases Database List Database Properties Description Number of ORSs currently defined in the Hub Store. List of registered Siperian Hub ORSs. Database properties for the selected ORS.

Configuring Operational Record Stores and Datasources

61

Configuring Operational Record Stores

Configuring Operational Record Stores
This section describes how to configure an ORS in your Hub Store. If you need assistance with configuring the ORS, consult with your database administrator. For more information about Operational Record Stores, see “Databases in the Hub Store” on page 56 and the Siperian Hub Installation Guide for your platform.

Registering an ORS
Note: Registering an ORS will fail if you try to register an ORS that does not contain the Siperian Hub repository objects or Siperian Hub procedures. To register an ORS: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Click the button.

If you are registering a DB2 database, select DB2 in the Database type drop-down list.

Configuring Operational Record Stores and Datasources

63

Configuring Operational Record Stores

The Databases tool displays the Register Database dialog box for a DB2 database.

5.

Specify the following settings. Note that Oracle and DB2 have slightly different settings.

64 Siperian Hub Administrator Guide

Configuring Operational Record Stores

Note: The Schema Name and the User Name are both the name of the ORS that was specified in the script used to create the ORS. If you need this information, consult your database administrator.
Property Identity Database Display Name Machine Identifier Connection Properties Database type Database hostname Database server name Oracle SID Database name One of the following values: Oracle or DB2. Oracle only. IP address or name (if supported on your network) of the server hosting the Oracle database. DB2 only. IP address or name (if supported on your network) of the database server. Oracle only. Oracle System Identifier (SID) that refers to the instance of the Oracle database running on the server. DB2 only. Name of the DB2 database. Note: The DB2 database needs to be cataloged via the DB2 client on the application server machine. Port One of the following settings: • • Oracle TNS Name Oracle: The TCP port of the Oracle listener running on the Oracle database server. The Oracle installation default is 1521. DB2: The TCP port on which the database server listens for connections. The DB2 installation default is 50000. Name for this ORS as it will be displayed in the Hub Console. Prefix given to keys to uniquely identify records from this instance of the Hub Store. Description

Oracle only. Name by which the database is known on your network as defined in the application server’s TNSNAMES.ORA file. For example: mydatabase.mycompany.com This value is set when you install Oracle. See your Oracle documentation to learn more about this name.

Schema Name

Name of the ORS.

Configuring Operational Record Stores and Datasources

65

Configuring Operational Record Stores

Property User Name

Description User name for the ORS. By default, this is the user name that was specified in the script used to create the ORS. This user owns all of the ORS database objects in the Hub Store. If a proxy user has been configured for this ORS, then you can specify the proxy user instead. For instructions on running of the setup_ors.sql script and defining proxy users, see the Siperian Hub Installation Guide.

Password

Password associated with the User Name for the ORS. • For Oracle, this password is case-insensitive. • For DB2, this password is case-sensitive. By default, this is the password associated with the user name that was specified in the script used to create the ORS. If a proxy user has been configured for this ORS, then you specify the password for the proxy user instead. For instructions on running of the setup_ors.sql script and defining proxy users, see the Siperian Hub Installation Guide.

Create datasource after Check (select) to create the datasource on the application server registration after registration. For WebLogic users, you will need to specify the WebLogic username and password.

6.

If you want to create the datasource on the application server after registration, check (select) the Create datasource after registration check box. Siperian Hub uses the datasources provided by the application server and, therefore, does not write any data to the ORS at the time of registration. Note for WebLogic: If you are using WebLogic, a dialog box prompts you for your username and password. This process writes only to the Master Database. The ORS and datasource need not be available at registration time. If you do not check this option, then you will need to manually configure the datasource, as described in “Configuring Datasources” on page 77.

Note: When you register an ORS that has been used elsewhere, and if the ORS already has Cleanse Match Servers registered and no other servers get registered, then you need to re-register one of the Cleanse Match Servers. This updates the data in c_repos_db_release.

Editing ORS Registration Properties
Only certain ORS registration properties are editable. For non-editable properties, you must instead unregister and re-register the ORS with the new properties. To edit registration settings for an ORS: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3. 4.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to configure. Click the button. The Databases tool displays the Update Database Registration dialog box for the selected ORS. Oracle Settings

Configuring Operational Record Stores and Datasources

67

Configuring Operational Record Stores

DB2 Settings

5.

Edit any of the following settings: • • Database display name Password By default, this is the password associated with the user name that was specified when the ORS was created. If a proxy user has been configured for this ORS, then you specify the password for the proxy user instead. For instructions on running of the setup_ors.sql script and defining proxy users, see the Siperian Hub Installation Guide. • • Update datasource after registration check box Oracle TNS name (Oracle only)

6.

To update the datasource on the application server with the modified settings, select (check) the Update datasource after registration check box Note: Updating the datasource settings might cause the JDBC connection pool settings to be reset to the default values. Be sure to check the JDBC connection pool settings before and after you click OK so that you can reapply any customizations to the JDBC connection pool settings.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to configure. The Databases tool displays the database properties for the selected ORS.

o

Configuring Operational Record Stores and Datasources

69

Configuring Operational Record Stores

The following table describes these properties.
Property Database Type Database ID Description Oracle or DB2 Identification for the ORS. This ID is used in SIF requests. The database ID lookup is case-sensitive. The format for the database ID is: jdbc/siperian-hostname-sid-databasename Example: jdbc/siperian-aiz01-aix01-cmx_ors-ds When registering a new ORS, the host, server, and database names are normalized. • • Host name is converted to lowercase. Database name is converted to uppercase (the standard for schemas, tables, etc.). The normalization of each field can be done on a database-specific basis so that it can be changed if needed. JNDI Datasource Name Machine Identifier GETLIST Limit (records) Displays the datasource JNDI name for the selected ORS. This is the JNDI name that is configured for this JDBC connection on the application server. Prefix given to keys to uniquely identify records from this instance of the Hub Store. Limits the number of records returned through SIF search requests, such as searchQuery, searchMatch, getLookupValues, and so on.

70 Siperian Hub Administrator Guide

Configuring Operational Record Stores

Property Production Mode

Description Specifies whether this ORS is in production mode. • If not enabled (unchecked, the default), production mode is disabled, allowing authorized users to edit metadata for this ORS in the Hub Console. • If enabled (checked), then production mode is enabled. Users cannot make changes to the metadata for this ORS. If a user attempts to acquire a write lock on an ORS in production mode, the Hub Console will display a message explaining that the lock cannot be obtained. Note: Only Siperian Hub administrator users can change this setting. For more information, see “Changing an ORS to Production Mode” on page 75.

4. 5.

To change a property, click the Click the

button next to it, and edit the property.

Save button to save your changes.

If production mode is enabled for an ORS, then the Databases tool displays a lock icon next to it in the list.

Production mode enabled

Testing ORS Connections
To test a Hub Store connection to an ORS: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to test.

Configuring Operational Record Stores and Datasources

71

Configuring Operational Record Stores

4.

Click the

button.

The Test Database command tests for: • • • • the database connection parameters via the JDBC connection the existence of the datasource a valid connection via the datasource a valid ORS version

Note for WebSphere: If the test connection fails through the Hub Console, verify that the test connection is successful from the WebSphere Console. The JNDI name is case sensitive and should match what is generated in the Hub Console.
5.

Click OK.

Changing Passwords
To change passwords for the Master Database or an ORS, you need to make changes first on your database server and possibly on your application server as well.

Changing the Password for the Master Database
To change the Master Database password: 1. On your database server, change the password for the CMX_SYSTEM database.
2.

Log into the administration console for your application server and edit the datasource connection information, specifying the new password for CMX_ SYSTEM, and then saving your changes.

72 Siperian Hub Administrator Guide

Configuring Operational Record Stores

Changing the Password for an ORS
To change the password for an ORS, there are two options. Option One 1. On your database server, change the password for the ORS schema.
2.

Start the Hub Console and select Master Database as the target database. To learn more, see “Changing the Target Database” on page 31. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to configure. Click the button. The Databases tool displays the Update Database Registration dialog box for the selected ORS.

Option Two On your database server, change the password for the ORS schema. Start the Hub Console and select Master Database as the target database. To learn more, see “Changing the Target Database” on page 31. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to configure. In the Database Properties panel, make a note of the JNDI Datasource Name for the selected ORS.

3.

4. 5. 6.

Configuring Operational Record Stores and Datasources

73

Configuring Operational Record Stores

7.

Log into the administration console for your application server and edit the datasource connection information for this ORS, specifying the new password for the noted JDNI Datasource name, and then saving your changes.

Encrypting Passwords
In order to successfully change the schema password, you must change it in the data sources defined in the application server. This password is not encrypted, because the application server protects it. In addition to updating the data sources on the application server, Siperian requires that the password to be encrypted and stored in various tables. Steps to Encrypt New Passwords To encrypt the new password, execute the following command from the prompt:
java -classpath siperian-common.jar com.siperian.common.security.Blowfish

The results will be echoed to the terminal window:
Plaintext Password: your_new_password Encrypted Password: encrypted password

CMX_SYSTEM/ORS User and Passwords User-name and passwords that can be changed when installing/configuring the MRM: • The CMX_SYSTEM user should not be changed. • The CMX_SYSTEM password can be changed after the MRM is installed. You need to change the password for the CMX user in Oracle, and you need to set the same password in the datasource on the application server. The CMX_ORS user and password can be changed when the setup_ors.sql is run. You need to use the same password when registering the ORS in the Hub Console.

•

Changing an ORS to Production Mode
The Hub Console allows administrators to lock the design of an ORS by enabling production mode. Once production mode is enabled, write locks and exclusive locks are not permitted, and no changes can be made to the schema definition in the ORS. When a Hub Console user attempts to place a lock on an ORS for which production mode is enabled, the Hub Console displays a message to the user explaining that the lock cannot be obtained because the ORS is in production mode. For more information, see “Acquiring Locks to Change Settings in the Hub Console” on page 28. To change the production mode flag for an ORS: 1. Log into the Hub Console with administrator-level privileges to the Siperian Hub implementation. In order to change this setting, you must have sufficient privileges to run the Databases tool and be able to obtain a lock on the Master Database.
2.

Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61. Clear any exclusive locks on the ORS. Note: This setting cannot be changed if the ORS is locked exclusively.

3.

Configuring Operational Record Stores and Datasources

75

Configuring Operational Record Stores

4. 5.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to configure. The Databases tool displays the database properties for the selected ORS. Change the setting of the Production Mode check box, as described in “Editing ORS Properties” on page 69. Select (check) the check box to enable production mode, or clear (uncheck) it to disable it.

6.

7.

Click the

Save button to save your changes.

Unregistering an ORS
Unregistering an ORS removes the connection information to this ORS from the Master Database and removes the datasource definition from the application server environment. To unregister an ORS: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3. 4.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Select the ORS that you want to unregister. Click the button. Note: If you are running WebLogic, enter the WebLogic user name and password when prompted. The Databases tool prompts you to confirm unregistering the ORS. Click Yes.

5.

76 Siperian Hub Administrator Guide

Configuring Datasources

Configuring Datasources
This section describes how to configure datasources for an ORS. Every ORS requires a datasource definition in the application server environment.

About Datasources
In Siperian Hub, a datasource specifies properties for an ORS, such as the location of the database server, the name of the database, the database user ID and password, and so on. A Siperian Hub datasource points to a JDBC resource defined in your application server environment. To learn more about JDBC datasources, see your application server documentation.

Managing Datasources in WebLogic
For WebLogic application servers, whenever you attempt to add, delete, or update a datasource, Siperian Hub prompts you to specify the application server administrative username and password. If you are performing multiple operations in the Databases tool, this dialog box remembers the last username that was entered, but always requires you to enter the password.

Creating Datasources
You might need to explicitly create a datasource if, for example, you created an ORS using a different application server, or if you did not check (select) the Create datasource after registration check box when registering the ORS. To create a datasource: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Right-click the ORS in the Databases list, and then choose Create Datasource. Note: If you are running WebLogic, enter the WebLogic user name and password when prompted.

Configuring Operational Record Stores and Datasources

77

Configuring Datasources

The Databases tool creates the datasource and displays a progress message.

4.

Click OK.

Removing Datasources
If you have registered an ORS with a configured datasource, you can use the Databases tool to manually remove its datasource definition from your application server. After removing the datasource definition, however, the ORS will still appear in Hub Console. To completely remove a database from the Hub Console, you need to unregister it (see “Unregistering an ORS” on page 76). To remove a datasource: 1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page 61.
2. 3.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Right-click an ORS in the Databases list, and then choose Remove Datasource. Note: If you are running WebLogic, enter the WebLogic user name and password when prompted.

78 Siperian Hub Administrator Guide

Configuring Datasources

The Databases tool removes the datasource and displays a progress message.

4.

Click OK.

Configuring Operational Record Stores and Datasources

79

Configuring Datasources

80 Siperian Hub Administrator Guide

5
Building the Schema
This chapter explains how to design and build your schema in Siperian Hub.

Before You Begin
Before you begin, you must have installed Siperian Hub and created the Hub Store (including on Operational Record Store) according to the instructions in Siperian Hub Installation Guide.

About the Schema
The schema is the data model that is used in your Siperian Hub implementation. Siperian Hub does not impose or require any particular schema. The schema exists inside Siperian Hub and is independent of the source systems providing data to Siperian Hub. Note: The process of designing the schema for your Siperian Hub implementation is outside the scope of this document. It is assumed that you have developed a data model—using industry-standard data modeling methodologies—that is based on a thorough understanding of your organization’s requirements and in-depth knowledge of the data you are working with. The Siperian schema is a flexible, repository-driven model that supports the data structure of any vertical business sector. The Hub Store is the database that underpins Siperian Hub and provides the foundation of Siperian Hub’s functionality. Every Siperian Hub installation has a Hub Store, which includes one Master Database and one or more Operational Record Store (ORS) databases. Depending on the configuration of your system, you can have multiple ORS databases in an installation. For example, you could have a development ORS, a testing ORS, and a production ORS. For more information, see Chapter 3, “About the Hub Store,” and Chapter 4, “Configuring Operational Record Stores and Datasources.” Before you begin to implement the schema, you must understand the basic structure of the underlying Siperian Hub schema and its components. This section introduces the most important tables in an ORS and how they work together. Note: You must use tools in the Hub Console to define and manage the consolidated schema—you cannot make changes directly to the database. For example, you must use the Schema Manager to define tables and columns. For details, see“Requirements for Defining Schema Objects” on page 87.

82 Siperian Hub Administrator Guide

About the Schema

Types of Tables in an Operational Record Store
An ORS contains both tables that you configure and system support tables.

Configurable Tables
The following types of Siperian Hub tables are used to model business reference data. You must explicitly create and configure these tables.
Types of Configurable Tables in an ORS

Type of Table base object

Description Used to store data for a central business entity (such as customer, product, or employee) or a lookup table (such as country or state). In a base object table (or simply a base object), you can consolidate data from multiple source systems and use trust settings to determine the most reliable value of each base object cell. You can define one-to-many relationships between base objects. Base objects must be explicitly created and configured according to the instructions in “Process Overview for Defining Base Objects” on page 94. Used to store detailed information about the records in a base object (for example, supplemental notes). One record in a base object can map to multiple records in a dependent object table (or simply a dependent object). Dependent objects must be explicitly created and configured according to the instructions in “Process Overview for Defining Dependent Objects” on page 119. Used to receive batch loads from a source system. Landing tables must be explicitly created and configured according to the instructions in “Configuring Landing Tables” on page 355. Used to load data into a base objects and dependent objects. Mappings are defined between landing tables and staging tables to specify whether and how data is cleansed and standardized when it is moved from a landing table to a staging table. Staging tables must be explicitly created and configured according to the instructions in “Configuring Staging Tables” on page 364.

dependent object

landing table

staging table

Building the Schema

83

About the Schema

Infrastructure Tables
The following types of Siperian Hub infrastructure tables are used to manage and support the flow of data in the Hub Store. Siperian Hub automatically creates, configures, and maintains these tables whenever you configure base objects and dependent objects.
Types of Infrastructure Tables in an ORS

Type of Table cross-reference table

Description Used for tracking the origin of each record in the base object. Named according to the following pattern: C_baseObjectName_XREF where baseObjectName is the root name of the base object (for example, C_PARTY_XREF). For this reason, this table is sometimes referred to as the XREF table. When you create a base object, Siperian Hub automatically creates a cross-reference table to store information about data coming from source systems. For more information, see “Cross-Reference Tables” on page 97.

history table

Used if history is enabled for a base object (see “Enable History” on page 102). Named according to the following pattern: C_baseObjectName_HIST—base object history table, as described in “Base Object History Tables” on page 101. C_baseObjectName_HXRF—cross-reference history table, as described in “Cross-Reference History Tables” on page 101. where baseObjectName is the root name of the base object (for example, C_PARTY_HIST and C_PARTY_HXRF). Siperian Hub creates and maintains several different history tables to provide detailed change-tracking options, including merge and unmerge history, history of the pre-cleansed data, history of the base object, and the cross-reference history.

match key table

Contains the match keys that were generated for all base object records. Named according to the following pattern: C_baseObjectName_STRP where baseObjectName is the root name of the base object (for example, C_PARTY_STRP). For more information, see “Columns in Match Key Tables” on page 325.

84 Siperian Hub Administrator Guide

About the Schema

Types of Infrastructure Tables in an ORS (Cont.)

Type of Table match table

Description Contains the pairs of matched records in the base object resulting from the execution of the match process on this base object. Named according to the following pattern: C_baseObjectName_MTCH where baseObjectName is the root name of the base object (for example, C_PARTY_MTCH). For more information, see “Populating the Match Table with Match Pairs” on page 330

external match table

Uses input (C_baseObjectName_EMI) and output (C_baseObjectName_ EMO) tables. • The EMI contains records to match against the records in the base object. • The EMO table contains the output data for External Match jobs. Each row in the EMO represents a pair of matched records—one from the EMI table and one from the base object: For more information, see “External Match Jobs” on page 719 and “External Match Jobs” on page 766.

Building the Schema

85

About the Schema

Supported Relationships Among Data
Siperian Hub supports one:many and many:many relationships among tables, as well as hierarchical relationships between records in the same base object. In Siperian Hub, relationships between records can be defined in various ways.

The following table describes these types of relationships.
Type of Relationship Description foreign key relationship between base objects One base object (the child) contains a foreign key column, which contains values that match values in the primary key column of another base object (the parent). For more information, see “Process Overview for Defining Foreign-Key Relationships” on page 143 and “Configuring Foreign-Key Relationships Between Base Objects” on page 140.

86 Siperian Hub Administrator Guide

About the Schema

Type of Relationship Description base object and dependent objects A base object (the parent) has a dependent object (the child). The foreign-key relationship is implicit between the dependent object and its parent base object. For example, a Customer base object could have an associated Notes dependent object to store free-form notes about a customer. For more information, see “Process Overview for Defining Dependent Objects” on page 119 and “Configuring Dependent Objects” on page 117. Within a base object, records are related to each other hierarchically. Allows you to define many-to-many relationships within the base object. For more information, see “Intra-Table Paths” on page 502.

records within the same base object

Once these relationships are configured in the Hub Console, you can use these relationships to configure match column rules by defining match paths between records. For more information, see “Configuring Match Paths for Related Records” on page 497.

Make Schema Changes Only in the Hub Console
Siperian Hub maintains schema consistency, provided that all model changes are done using the Hub Console tools, and that no changes are made directly to the database. Siperian Hub provides all the tools necessary for maintaining the schema.

Think Before You Change the Schema
Important: Schema changes can involve risk to data and should be approached in a managed and controlled manner. You should plan the changes to be made and analyze the impact of the changes before making them. You should also back up the database before making any changes.

Building the Schema

87

About the Schema

You Must Have a Write Lock to Change the Schema
In order to make any changes to the schema, you must have a write lock. For more information, see “Acquiring a Write Lock” on page 30.

Adding Columns for Technical Reasons
For purely technical reasons, you might want to add columns to a base object. For example, for a segment match, you must add a segment column. For more information on adding columns for segment matches, see “Segment Matching” on page 562. We recommend that you distinguish columns added to base objects for purely technical reasons from those added for other business reasons, because you generally do not want to include these columns in most views used by data stewards. Prefixing these column names with a specific identifier, such as CSTM_, is one way to easily filter them out.

Starting the Schema Manager
You use the Schema Manager in the Hub Console to define the schema, staging tables, and landing tables. The Schema Manager is also used to define rules for match and merge, validation, and message queues. To start the Schema Manager: • In the Hub Console, expand the Model workbench, and then click Schema.

90 Siperian Hub Administrator Guide

Starting the Schema Manager

The Hub Console displays the Schema Manager.

Navigation Pane

Properties Pane

The Schema Manager is divided into two panes.
Pane Navigation pane Description Shows (in a tree view) the core schema objects: base objects and landing tables. Expanding an object in the tree shows you the property groups available for that object. Shows the properties for the selected object in the left-hand pane. Clicking any node in the schema tree displays the corresponding properties page (that you can view and edit) in the right-hand pane.

Properties pane

For general instructions about using the Schema Manager, see “Navigating the Hub Console” on page 24. You must use the Schema Manager when defining tables in an ORS, as described in “Requirements for Defining Schema Objects” on page 87.

About Base Objects
In Siperian Hub, central business entities—such as customers, accounts, products, or employees—are represented in tables called base objects. A base object is a table in the Hub Store that contains collections of data about individual entities—such as customer A, customer B, customer C, and so on. Each individual entity has a single master record—the best version of the truth—for that entity. An individual entity might have additional records in the base object (contributing records) that contain the “multiple versions of the truth” that need to be consolidated into the master record. Consolidation is the process of merging duplicate records into a single consolidated record that contains the most reliable cell values from all of the source records.
Most Reliable Cell Value Master Record Contributing Records

Important: You must use the Schema Manager to define base objects—you cannot configure them directly in the database. For more information, see “Requirements for Defining Schema Objects” on page 87.

92 Siperian Hub Administrator Guide

Configuring Base Objects

Relationships Between Base Objects and Other Tables in the Hub Store
The following figure shows base objects in relation to other tables in the Hub Store.

Building the Schema

93

Configuring Base Objects

Process Overview for Defining Base Objects
To define a base object: 1. Using the Schema Manager, create a base object table according to the instructions in “Creating Base Objects” on page 107. The Schema Manager automatically adds system columns, as described in “Base Object Columns” on page 95.
2.

Add the user-defined columns that will contain business data according to the instructions in “Configuring Columns in Tables” on page 125. Note: Column names cannot be longer than 26 characters. While configuring column properties, specify which column(s) will use trust to determine the most reliable value when different source systems provide different values for the same cell. For more information, see “Configuring Trust for Source Systems” on page 455. For this base object, create one staging table per source system according to the instructions in “Configuring Staging Tables” on page 364. For each staging table, select the base object columns that you want to include. Create any landing tables that you need to store data from source systems. For more information, see “Configuring Landing Tables” on page 355. Map the landing tables to the staging tables according to the instructions in “Mapping Columns Between Landing and Staging Tables” on page 380. If any columns need data cleansing, specify the cleanse function in the mapping according to the instructions in Chapter 12, “Configuring Data Cleansing.”. Each staging table must get its data from one landing table (with any intervening cleanse functions), but the same landing table can provide data to more than one staging table. Map the primary key column of the landing table to the PKEY_ SRC_OBJECT column in the staging table.

3.

4.

5.

6.

7.

Populate each landing table with data using an ETL tool or some other process, as described in “Land Process” on page 292.

94 Siperian Hub Administrator Guide

Configuring Base Objects

Base Object Columns
Base objects have two types of columns:
Column Type system columns user-defined columns Description Columns that are automatically created and maintained by the Schema Manager. Columns that have been added by users according to the instructions in “Configuring Columns in Tables” on page 125.

Base objects have the following system columns.
Physical Name ROWID_OBJECT Data Type (Size) Description CHAR (14) Primary key. Unique value assigned by Siperian Hub whenever a new record is inserted into the base object. User or process responsible for creating the record. Date on which the record was created. User or process responsible for the most recent update on the record. Date of the most recent update to any cell on the record. Integer value indicating the consolidation state of this record. Valid values are: • 1=Consolidated • 2=Ready for merge • 3=Undergoing the match process • 4=Ready for match • 9=On Hold For more information, see “Consolidation Status for Base Object Records” on page 289. DELETED_IND DELETED_BY INT VARCHAR (50) Reserved for future use. Reserved for future use.

CREATOR CREATE_DATE UPDATED_BY LAST_UPDATE_DATE CONSOLIDATION_IND

VARCHAR (50) DATE VARCHAR (50) DATE INT

Building the Schema

95

Configuring Base Objects

Physical Name DELETED_DATE LAST_ROWID_SYSTEM

Data Type (Size) Description DATE CHAR (14) Reserved for future use. The identifier of the system responsible for the most recent update to any cell in the base object record. Foreign key referencing ROWID_ SYSTEM column on C_REPOS_ SYSTEM table.

DIRTY_IND

INT

Used to determine whether the tokenize process generates match keys for this record. Valid values are: • • 0 = record is up to date 1 = record is new or has been updated and needs to be tokenized After the record has been tokenized, this flag is reset to zero (0). For more information, see “Base Object Records Flagged for Tokenization” on page 323.

INTERACTION_ID

INT

For state-enabled base objects only. Interaction identifier that is used to protect a pending cross-reference record from updates that are not part of the same process as the original cross-reference record. For details, see “Protecting Pending Records Using the Interaction ID” on page 208. For state-enabled base objects only. Integer value indicating the state of this record. Valid values are: • 0=Pending • 1=Active (Default) • -1=Deleted For details, see “About the Hub State Indicator” on page 207.

About Cross-Reference Tables
Each base object has one associated cross-reference table (or XREF table), which is used for tracking the lineage (origin) of records in the base object. Siperian Hub automatically creates a cross-reference table when you create a base object. Siperian Hub uses cross-reference tables to translate all source system identifiers into the appropriate ROWID_OBJECT values. Note: Cross-reference tables are not created or needed for dependent objects, as dependent objects are not matched and consolidated. Records in Cross-Reference Tables Each row in the cross-reference table represents a separate record from a source system. If multiple sources provide data for a single column (for example, the phone number comes from both the CRM and ERP systems), then the cross-reference table contains separate records from each source system. Each base object record will have one or more associated cross-reference records. The cross-reference record contains: • an identifier for the source system that provided the record • • the primary key value of that record in the source system the most recent cell value(s) provided by that system

Load Process and Cross-Reference Tables The load process populates cross-reference tables. During load inserts, new records are added to the cross-reference table. During load updates, changes are written to the affected cross-reference record(s).

Building the Schema

97

Configuring Base Objects

Data Steward Tools and Cross-Reference Tables Cross-reference records are visible in the Merge Manager and can be modified using the Data Manager. For more information, see the Siperian Hub Data Steward Guide.

Relationships Between Base Objects and Cross-Reference Tables
The following figure shows an example of the relationships between base objects, cross-reference tables, and C_REPOS_SYSTEM.

98 Siperian Hub Administrator Guide

Configuring Base Objects

Columns in Cross-Reference Tables
Cross-reference tables have the following system columns. Note that cross-reference tables have a unique key representing the combination of the PKEY_SRC_OBJECT and ROWID_SYSTEM columns.
Physical Name ROWID_XREF PKEY_SRC_OBJECT Data Type (Size) Description NUMBER (38) VARCHAR2 (255) Primary key that uniquely identifies this record in the cross-reference table. Primary key value from the source system. Multi-field/multi-column keys from source systems must be concatenated into a single key value using the Siperian Hub internal cleanse process (see “About Data Cleansing in Siperian Hub” on page 406) or external cleanse process (an ETL tool or some other data loading utility). Foreign key to C_REPOS_SYSTEM, which is the Siperian Hub repository table that stores a Siperian Hub identifier and description of each source system that can populate the ORS. For more information, see “Configuring Source Systems” on page 348. Foreign key to the base object. Unique value assigned by Siperian to the associated record in the base object. Last source update date. Updated only when an update is received from the source system. User or process responsible for creating the cross-reference record. Date on which the cross-reference record was created. User or process responsible for the most recent update to the cross-reference record. Date of the most recent update to any cell in the cross-reference record. Can be updated as applicable during the load and consolidation processes. Reserved for future use. Reserved for future use. Reserved for future use. Indicates whether a record has been edited using the Data Manager.

ROWID_SYSTEM

CHAR (14)

ROWID_OBJECT SRC_ LUD CREATOR CREATE_DATE UPDATED_BY LAST_UPDATE_DATE

CHAR (14) DATE VARCHAR2 (50) DATE VARCHAR2 (50) DATE

DELETED_IND DELETED_BY DELETED_DATE PUT_UPDATE_MERGE_IND

NUMBER (38) VARCHAR2 (50) DATE NUMBER (38)

Building the Schema

99

Configuring Base Objects

Physical Name INTERACTION_ID

Data Type (Size) Description NUMBER (38) For state-enabled base objects only. Interaction identifier that is used to protect a pending cross-reference record from updates that are not part of the same process as the original cross-reference record. For more information, see “Protecting Pending Records Using the Interaction ID” on page 208. For state-enabled base objects only. Integer value indicating the state of this record. Valid values are: • 0=Pending • 1=Active (Default) • -1=Deleted For more information, see “About the Hub State Indicator” on page 207.

HUB_STATE_IND

NUMBER (38)

PROMOTE_IND

NUMBER (38)

For state-enabled base objects only. Integer value indicating the promotion status. Used by the Promote job to determine whether to promote the record to an ACTIVE state. Valid values are: • 0=Do not promote this record • 1=Promote this record to ACTIVE This value is not changed to 0 during the Promote job if the record is not promoted. For more information, see “Promoting Records Using the Promote Batch Job” on page 218.

History Tables
This section describes history tables in the Hub Store. If history is enabled for a base object (see “Enable History” on page 102), then Siperian Hub maintains history tables for base objects and cross-reference tables. History tables are used by Siperian Hub to provide detailed change-tracking options, including merge and unmerge history, history of the pre-cleansed data, history of the base object, the cross-reference history, and so on.

100 Siperian Hub Administrator Guide

Configuring Base Objects

Base Object History Tables
A history-enabled base object has a single history table (named C_baseObjectName_ HIST) that contains historical information about data changes in the base object. Whenever a record is added or updated in the base object, a new record is inserted into the base object history table to capture the event.

Cross-Reference History Tables
A history-enabled base object has a single cross-reference history table (named C_ baseObjectName_HXRF) that contains historical information about data changes in the cross-reference table. Whenever a record changes in the cross-reference table, a new record is inserted into the cross-reference history table to capture the event.

Basic Base Object Properties
This section describes the basic base object properties. Item Type The type of table that you are adding. Select Base Object. Display Name The name of this base object as it will be displayed in the Hub Console. Enter a descriptive name. Physical Name The actual name of the table in the database. Siperian Hub will suggest a physical name for the table based on the display name that you enter. Make sure that you do not use

Building the Schema

101

Configuring Base Objects

any reserved name suffixes, as described in “Rules for Database Object Names” on page 88. Data Tablespace The name of the data tablespace. Read-only. For more information, see the Siperian Hub Installation Guide for your platform. Index Tablespace The name of the index tablespace. Read-only. For more information, see the Siperian Hub Installation Guide for your platform. Description A brief description of this base object. Enable History Specifies whether history is enabled for this base object. If enabled, Siperian Hub keeps a log of records that are inserted, updated, or deleted for this base object. You can use the information in history tables for audit purposes. For more information, see “History Tables” on page 100.

Advanced Base Object Properties
This section describes the advanced base object properties. Complete Tokenize Ratio When the percentage of the records that have changed is higher than this value, a complete re-tokenization is performed. If the number of records to be tokenized does not exceed this threshold, then Siperian Hub deletes the records requiring re-tokenization from the match key table, calculates the tokens for those records, and then reinserts them into the match key table. The default value is 60. For more information, see “Match Keys and the Tokenization Process” on page 322.

102 Siperian Hub Administrator Guide

Configuring Base Objects

Note: Deleting can be a slow process. However, if your Cleanse Match Server is fast and the network connection between Cleanse Match Server and the database server is also fast, then you may test with a much lower tokenization threshold (such as 10%). This will enable you to determine whether there are any gains in performance. Allow constraints to be disabled During the initial load/updates—or if there is no real-time, concurrent access—you can disable the referential integrity constraints on the base object to improve performance. The default value is 1, signifying that constraints are disabled. For more information, see “Load Process” on page 299 and Chapter 13, “Configuring the Load Process.” Duplicate Match Threshold This parameter is used only with the Match for Duplicate Data job for initial data loads. The default value is 0. To enable this functionality, this value must be set to 2 or above. For more information, see “Match for Duplicate Data Jobs” on page 740 and the Siperian Hub Data Steward Guide. Load Batch Size The load process inserts and updates batches records in the base object. The load batch size specifies the number of records to load per batch cycle (default is 1000000). For more information, see “Loading Records by Batch” on page 305, and Chapter 13, “Configuring the Load Process.” Max Elapsed Match Minutes This specifies the execution timeout (in minutes) when executing a match rule. If this time limit is reached, then the match process (whenever a match rule is executed, either manually or via a batch job) will exit. If a match process is executed as part of a batch job, the system should move onto the next match. It will stop if this is a single match process. The default value is 20. Increase this value only if the match rule and data are very complex. Generally, rules are able to complete with 20 minutes (the default). For more information, see “Match Process” on page 317 and Chapter 14, “Configuring the Match Process.”

Building the Schema

103

Configuring Base Objects

Parallel Degree Oracle only. This specifies the degree of parallelism set on the base object table and its related tables. It does not take effect for all batch processes, but can have a beneficial effect on performance when it is used. However, its use is constrained by the number of CPUs on the database server machine, as well as the amount of memory available. The default value is 1. Requeue On Parent Merge If this value is greater than zero, when parents are merged, the related child records are set as unconsolidated. If set, when parents are merged, then related child records are flagged as New again (consolidation indicator is 4, see “Consolidation Status for Base Object Records” on page 289) so that they can be matched. The default value is 0. For more information, see “Consolidation Indicator” on page 289 and “Immutable Rowid Object” on page 594. Generate Match Tokens on Load If selected (checked), then the tokenization process executes after the completion of the load process. This is useful for intertable match scenarios in which the parent must be loaded first, followed by the child match/merge. By not tokenizing the parent, the child match/merge will not need to update any of the parent records in the match key table. Once the child match/merge is complete, you can run the match process on the parent to force it to tokenize. This is also useful in cases where you have a limited window in which to perform the load process. Not tokenizing will save time in the load process, at the cost of tokenizing the data later. You must tokenize before you match your data. For more information, see “Load Process” on page 299, “Generating Match Tokens (Optional)” on page 316, and “Generating Match Tokens During Load Jobs” on page 730.

104 Siperian Hub Administrator Guide

Configuring Base Objects

Generate Match Tokens on Put You can PUT data into a base object using the Data Manager (see the Siperian Hub Data Steward Guide). If you are using the Data Manager to PUT data, you can enable (check) this value to tokenize your data later. Performing this operation later allows you to process PUT requests faster. Use this only when you know that the data will not be matched immediately. For more information, see “Match Keys and the Tokenization Process” on page 322. Note: Do not use the Generate Match Tokens on Put option if you are using the SIF API. If you have this parameter enabled, your SIF Put and CleansePut requests will fail. Use the Tokenize request instead. Enable Generate Match Tokens on Put only if you are not using the SIF API and you want data steward updates from the Hub Console to be tokenized immediately. For more information, see “Editing Base Object Properties” on page 108. Enable Row Locking During Batch If checked (selected), this feature enables locking of the data during updates, which allows for a higher degree of concurrent access. The default value is 0, signifying that row locking is disabled during batch. Match Flag Audit Table Specifies whether a match flag audit table is created. • If checked (selected), then an audit table (BusinessObjectName_FMHA) is created and populated with the userID of the user who, in Merge Manager, queued a manual match record for automerging. For more information about the Merge Manager tool, see the Siperian Hub Data Steward Guide. • If unchecked (not selected), then the Updated_By column is set to the userID of the person who executed the Automerge batch job.

For more information, see “Match Process” on page 317 and Chapter 14, “Configuring the Match Process.”

Building the Schema

105

Configuring Base Objects

Enable State Management Specifies whether Siperian Hub manages the system state for records in this base object. By default, state management is disabled. Select (check) this check box to enable state management for this base object in support of approval workflows. If enabled, this base object is referred to in this document as a state-enabled base object. For more information, see Chapter 7, “State Management,” and “Enabling State Management” on page 211. Enable History of Cross-Reference Promotion For state-enabled base objects, specifies whether Siperian Hub maintains the promotion history for cross-reference records that undergo a state transition from PENDING (0) to ACTIVE (1). By default, this option is disabled. For more information, see Chapter 7, “State Management,” and “Enabling the History of Cross-Reference Promotion” on page 213. Base Object Style Select the style (merge or link) for this base object. • A merge-style base object (the default) is used with Siperian Hub’s match and merge capabilities. • A link-style base object is used with Siperian Hub’s match and link capabilities. If selected, Siperian Hub creates a LINK table for this base object. If you change a link-style base object back to a merge-style base object, the Schema Manager prompts you to confirm whether you want to drop the LINK table.

106 Siperian Hub Administrator Guide

Configuring Base Objects

Creating Base Objects
To create each base object in your schema: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click in the left pane of the Schema Manager and choose Add Item from the popup menu. The Schema Manager displays the Add Table dialog box.

3.

4.

Specify the basic base object properties. For more information, see “Basic Base Object Properties” on page 101. Click OK. The Schema Manager creates the new base table in the Operational Record Store (ORS), along with any support tables, and then adds the new base object table to the schema tree.

5.

Building the Schema

107

Configuring Base Objects

Editing Base Object Properties
To edit the properties of an existing base object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, select the base object that you want to modify. The Schema Manager displays the Basic tab of the Base Object Properties page.

3.

4.

For each property that you want to edit on the Basic tab, click the Edit button next to it, and specify the new value. For more information, see “Basic Base Object Properties” on page 101. If you want, check (select) the Enable History check box to have Siperian Hub keep a log of records that are inserted, updated, or deleted. You can use a history table for audit purposes.

5.

108 Siperian Hub Administrator Guide

Configuring Base Objects

6.

To modify other base object properties, click the Advanced tab.

7.

Specify the advanced properties for this base object. For more information, see “Advanced Base Object Properties” on page 102.

Building the Schema

109

Configuring Base Objects

8.

In the left pane, click Match/Merge Setup beneath the base object’s name.

9.

Specify the match / merge object properties. At a minimum, consider configuring the following properties: • • maximum number of matches for manual consolidation (see “Maximum Matches for Manual Consolidation” on page 490) number of rows per match job batch cycle (see “Number of Rows per Match Job Batch Cycle” on page 491) button and enter a new value. button to save your changes.

To edit a property, click the
10.

Click the

110 Siperian Hub Administrator Guide

Configuring Base Objects

For more information about setting the properties for matching and merging, see “Configuring Match Properties for a Base Object” on page 488.

Configuring Custom Indexes for Base Objects
This section describes how to configure custom indexes for a base object.

About Custom Indexes
When you configure columns for a base object, system indexes are created automatically for primary keys and unique columns. In addition, Siperian Hub automatically drops and creates system indexes as needed when executing batch jobs or stored procedures. A custom index is a optional, supplemental index for a base object that you can define and have Siperian Hub maintain automatically. Custom indexes are non-unique. You might want to add a custom index to a base object for performance reasons. For example, suppose an external application calls the SIF SearchQuery request to search a base object by last name. If the base object has a custom index on the last name column, the last name search is processed more quickly. For custom indexes that are registered in Siperian Hub, custom indexes are automatically dropped and recreated during batch execution to improve performance. You have the option to manually define indexes outside the Hub Console using a database utility for your database platform. For example, you could create a function-based index—such as Upper(Last_Name) in the index expression—in support of some specialized operation. However, if you add a user-defined index which are not supported by the Schema Manager, then the custom index is not registered with Siperian Hub, and you are responsible for maintaining that index—Siperian Hub will not maintain it for you. If you do not properly maintain the index, you risk affecting batch processing performance.

Building the Schema 111

Configuring Base Objects

Navigating to the Custom Index Setup Node
1.

Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the tree beneath the base object you want to work with. Click the Custom Index Setup node. The Schema Manager displays the Custom Index Setup page.

2.

3. 4.

Creating a Custom Index To add a new custom index: 1. In the Schema Manager, navigate to the Custom Index Setup node for the base object that you want to work with, as described in “Navigating to the Custom Index Setup Node” on page 112.
2.

Click the

Add button.

112 Siperian Hub Administrator Guide

Configuring Base Objects

The Schema Manager creates a new custom index (NI_C_BaseObjectName_inc, where inc is a incremented number) and displays the list of columns in the base object.

3. 4.

Select the column(s) that you want in the custom index. Click the Save button to save your changes.

Building the Schema

113

Configuring Base Objects

If an index already exists for the selected column(s), the Schema Manager displays an error message and does not create the index.

Click OK to close the dialog box. Editing a Custom Index To change a custom index, you must delete the existing custom index and add a new custom index with the columns that you want. Deleting a Custom Index To delete a custom index: 1. In the Schema Manager, navigate to the Custom Index Setup node for the base object that you want to work with, as described in “Navigating to the Custom Index Setup Node” on page 112.
2. 3.

In the Indexes list, select the custom index that you want to delete. Click the Click Yes. Delete button. The Schema Manager prompts you to confirm deletion.

4.

114 Siperian Hub Administrator Guide

Configuring Base Objects

Viewing the Impact Analysis of a Base Object
The Schema Manager allows you to view all of the tables, packages, and queries associated with a base object. You would typically do this before deleting a base object to ensure that you do not delete other associated objects by mistake. To view the impact analysis for a base object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, select the base object that you want to view. Right-click the mouse and choose Impact Analysis. The Schema Manager displays the Table Impact Analysis dialog box.

3. 4.

5.

Click Close.

Building the Schema

115

Configuring Base Objects

Deleting Base Objects
To delete a base object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, select the base object that you want to delete. Right-click the mouse and choose Remove. The Schema Manager prompts you to confirm deletion. Choose Yes. The Schema Manager asks you whether you want to view the impact analysis before deleting the base object.

3. 4.

5.

6.

Choose No if you want to delete the base object without viewing the impact analysis. The Schema Manager removes the deleted base object from the schema tree.

About Dependent Objects
A dependent object is used to store supplemental information about the records in a base object (for example, a header-detail relationships). One record in a base object table can map to multiple records in a dependent object table. In the schema hierarchy, dependent objects are wholly subordinate to the parent base object. As such, dependent objects require less functionality than base objects—they do not support such features as match and consolidation, history, or trust. For more information, see “Types of Tables in an Operational Record Store” on page 83. Important: You must use the Schema Manager to define dependent objects—you cannot configure them directly in the database. For more information, see “Requirements for Defining Schema Objects” on page 87. A dependent object table contains supplemental information about the records in a base object table. For example, a Customer base object might have a dependent object called Notes that contains free-form notes about each customer. In the schema hierarchy, a dependent object is wholly subordinate to the base object with which it is associated.

Building the Schema

117

Configuring Dependent Objects

How Dependent Objects Are Related to Base Objects and Cross-reference Tables
The following figure shows how dependent objects are related to base objects and cross-reference tables.

Create the dependent object table according to the instructions in “Creating Dependent Objects” on page 121. Configure user-defined columns for this dependent object according to the instructions in “Configuring Columns in Tables” on page 125. Create staging tables for the base object table and the dependent object table according to the instructions in “Configuring Staging Tables” on page 364. Create landing tables for the source systems, if they do not already exist, according to the instructions in “Configuring Landing Tables” on page 355. Map the landing tables to the staging tables. Map the column that contains the source system primary key for the base object to the ROWID_OBJECT column in the dependent object’s staging table. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380. Populate the landing tables. When data is loaded, Siperian Hub copies the appropriate primary key value from the base object table into the dependent object table. The same record in a base object table can correspond to multiple records in a dependent object table.

3.

4.

5.

6.

7.

Building the Schema

119

Configuring Dependent Objects

Dependent Object Columns
Dependent objects have two types of columns:
Column Type system columns user-defined columns Description Columns that are automatically created and maintained by the Schema Manager. Columns that have been added by users according to the instructions in “Configuring Columns in Tables” on page 125.

Dependent objects have the following system columns.
Physical Name ROWID_XREF ROWID_OBJECT Data Type (Size) INT CHAR (14) Description Cross-reference key from the parent base object’s cross-reference table. Foreign key that points to the primary key of the base object record associated with this dependent object record. Identifier of the source system dependent object. Primary key of the dependent object in the source system. The combination of this column and ROWID_XREF must be unique. It is recommended that, in the case where the source system does not provide a single unique column for the dependent object, the DEP_PKEY_ SRC_OBJECT should have the concatenated values from the columns that actually make up a unique combination.

DEP_ROWID_SYSTEM DEP_PKEY_SRC_OBJECT

CHAR (14) VARCHAR (255)

120 Siperian Hub Administrator Guide

Configuring Dependent Objects

Physical Name INTERACTION_ID

Data Type (Size) INT

Description For state-enabled base objects only. Interaction identifier that is used to protect a pending cross-reference record from updates that are not part of the same process as the original cross-reference record. For details, see “Protecting Pending Records Using the Interaction ID” on page 208. User or process responsible for creating the record. Date on which the record was created. User or process responsible for the most recent update. Date of the most recent update.

CREATOR CREATE_DATE UPDATED_BY LAST_UPDATE_DATE

VARCHAR (50) DATE VARCHAR (50) DATE

Creating Dependent Objects
To create a dependent object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Building the Schema

121

Configuring Dependent Objects

3.

Expand the schema tree for the base object on which the new object will depend.

Specify the following information:
Property Item Type Display Name Physical Name Description Type of table that you are adding (Dependent Object). Name for this dependent object as it will be displayed in the Hub Console. Actual name of the table in the database. Siperian Hub will suggest a physical name for the table based on the display name that you enter. Name of the data tablespace. For more information, see the Siperian Hub Installation Guide for your platform. Name of the index tablespace. For more information, see the Siperian Hub Installation Guide for your platform. Description of this dependent object.

Data Tablespace Index Tablespace Description
6.

Click OK. The Schema Manager creates the new dependent object table in the Operational Record Store (ORS) and then adds the new base object table to the schema tree.

Editing Dependent Objects
To edit an existing dependent object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the schema tree for the base object associated with the dependent object. Expand the Dependent Objects list.

3. 4.

Building the Schema

123

Configuring Dependent Objects

5.

Select the dependent object that you want to edit.

6.

For each property that you want to edit, click the specify the new value. Expand the tree below the dependent object.

Edit button next to it, and

7.

Note: Dependent objects do not have all of the nodes that are available to base objects. • • • To modify columns, select Columns and follow the instructions in “Configuring Columns in Tables” on page 125. To modify the message trigger configuration, select Message Trigger Setup and follow the instructions in “Adding Message Triggers” on page 615. To modify staging tables, select Staging Tables and follow the instructions in “Configuring Staging Tables” on page 364.

124 Siperian Hub Administrator Guide

Configuring Columns in Tables

Deleting Dependent Objects
To delete a dependent object: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, select the base object associated with the dependent table that you want to delete. Expand the Dependent Objects list. Select the dependent object. Right-click the mouse and choose Remove. The Schema Manager prompts you to confirm deletion. Choose Yes. The Schema Manager removes the deleted dependent object from the schema tree.

3.

4. 5. 6.

7.

Configuring Columns in Tables
After you have created a table (base object, dependent object, or landing table), you use the Schema Manager to define the columns for that table according to the “Requirements for Defining Schema Objects” on page 87. You must use the Schema Manager to define columns in tables—you cannot configure them directly in the database. Note: In the Schema Manager, you can also view the columns for cross-reference tables and history tables, but you cannot edit them.

Building the Schema

125

Configuring Columns in Tables

About Columns
This section provides general information about table columns.

Types of Columns in ORS Tables
Tables in the Hub Store contain two types of columns:
Column system columns user-defined columns Description A column that Siperian Hub automatically creates and maintains. System columns contain metadata. Any column in a table that is not a system column. User-defined columns are added in the Schema Manager and usually contain business data.

Description Name for this column as it will be displayed in the Hub Console. Actual name of the column in the table. Siperian Hub will suggest a physical name for the column based on the display name that you enter. Note: For physical names of columns, do not use: • • any reserved column names, as described in “Reserved Column Names” on page 89 the dollar sign ($) character If null values are allowed, you do not need to specify a default value. If null values are not allowed, then you must specify a default value.

Nullable

Enable (check) this option if the column can be empty (null). • •

Data Type

For character data types, you can specify the length. For certain numeric data types, you can specify the precision and scale. For more information, see “Data Types for Columns” on page 126. Enable (check) this option if this column has a default value.

Has Default

Building the Schema

127

Configuring Columns in Tables

Column Properties (Cont.)

Property Default Trust

Description Used if no value is provided for the column but the column cannot be null. Enable (check) this option if this column will contain values from more than one source system, and you want to use trust to determine the most reliable value. If you do not enable trust for the column, then the most recent value will always be used. For more information, see “Enabling Trust for a Column” on page 461 and “Configuring Trust for Source Systems” on page 455. Enable (check) this option to enforce unique column constraints on from a staging table. Most organizations use the primary key from the source system for the lookup value. A record with a duplicate value in this column will be rejected. Warning: Avoid enabling the Unique option on base objects that might be consolidated. If you have a base object with a unique column and then load the same key from different systems, the insert into this base object fails. To use this feature, you must have unique keys across all systems.

Unique

Validate

Enable (check) this option if validation rule(s) will be configured for this column. Validation rules are applied during the load process to downgrade trust scores for cell values in this column. For more information, see “Enabling Validation Rules for a Column” on page 470.

Null Value Merge Determines the survivorship of null values during the consolidation process. • By default, this option is disabled. Trust scores for cells containing null values are automatically downgraded so that, during consolidation, null values are unlikely to win over non-null values. Instead, non-null values from the next available trusted source would survive. If enabled (checked), trust scores for cells containing null values are calculated normally, and null values might overwrite non-null values during consolidation. If you want to reduce trust on cells containing null data, you must write validation rules to do so.

•

GBID

Enable (check) this option if you want to define this column as the Global Business Identifier (GBID) for this object. Examples include a social security number, a driver’s license number, and so on. Doing so eliminates the need to custom-define identifiers. You can configure any number of GBID columns for API access and batch loads. For more information, see “Global Identifier (GBID) Columns” on page 129. Note: To be configured as a GBID column, the column must be an INT data type or it must have exactly 255 characters in length for one of the following data types: CHAR, NCHAR, VARCHAR, and NVARCHAR2.

128 Siperian Hub Administrator Guide

Configuring Columns in Tables

Global Identifier (GBID) Columns
A Global Business Identifier (GBID) column contains common identifiers (key values) that allow you to uniquely and globally identify a record based on your business needs. Examples include: • Identifiers defined by applications external to Siperian Hub, such as ERP (SAP or Siebel customer numbers) or CRM systems. • Identifiers defined by external organizations, such as industry-specific codes (AMA numbers, DEA numbers. and so on), or government-issued identifiers (social security number, tax ID number, driver’s license number, and so on).

Note: To be configured as a GBID column, the column must be an integer, CHAR, VARCHAR, NCHAR, or NVARCHAR column type. A non-integer column must be exactly 255 characters in length. In the Schema Manager, you can define multiple GBID columns in a base object. For example, an employee table might have columns for social security number and driver’s license number, or a vendor table might have a tax ID number. A Master Identifier (MID) is a common identifier that is generated by a system of reference or system of record that is used by others (for example, CIF, legacy hubs, CDI/MDM Hub, counterparty hub, and so on). In Siperian Hub, the MID is the ROWID_OBJECT, which uniquely identifies individual records from various source systems. GBIDs do not replace the ROWID_OBJECT. GBIDs provide additional ways to help you integrate your Siperian Hub implementation with external systems, allowing you to query and access data through unique identifiers of your own choosing (using SIF requests, as described in the Siperian Services Integration Framework Guide). In addition, by configuring GBID columns using already-defined identifiers, you can avoid the need to custom-define identifiers. GBIDs help with the traceability of your data. Traceability is keeping track of the data so that you can determine its lineage—which systems, and which records from those systems, contributed to consolidated records. When you define GBID columns in a base object, the Schema Manager creates a separate table for this base object (the table

Building the Schema

129

Configuring Columns in Tables

name ends with _HUID) that tracks the old and new values (current/obsolete value pairs). For example, suppose two of your customers (both of which had different tax ID numbers) merged into a single company, and one tax ID number survived while the other one became obsolete. If you defined the taxID number column as a GBID, Siperian Hub could help you track both the current and historical tax ID numbers so that you could access data (via SIF APIs) using the historical value. Note: Siperian Hub does not perform any data verification or error detection on GBID columns. If the source system has duplicate GBID values, then those duplicate values will be passed into Siperian Hub.

Columns in Staging Tables
The columns for staging tables cannot be defined using the column editor. Staging table columns are a special case, as they are based on some or all columns in the staging table’s target object. You use the Add/Edit Staging Table window to select the columns on the target table that can be populated by the staging table. Siperian Hub then creates each staging table column with the same data types as the corresponding column in the target table. See “Configuring Staging Tables” on page 364 for more information on choosing the columns for staging tables.

Maximum Number of Columns for Base Objects
A base object cannot have more than 200 user-defined columns if it will have match rules that are configured for automatic consolidation. For more information, see “Flagging Matched Records for Automatic or Manual Consolidation” on page 333 and “Specifying Consolidation Options for Matched Records” on page 543.

130 Siperian Hub Administrator Guide

Configuring Columns in Tables

Navigating to the Column Editor
To configure columns for base objects, dependent objects, and landing tables: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the schema tree for the object to which you want to add columns. Select Columns. The Schema Manager displays column definitions in the Properties pane.

3. 4.

Note: In the above example, the schema shows ANSI SQL data types that Oracle converts to its own data types. For more information, see “Data Types for Columns” on page 126. The Column Editor displays a “locked” icon next to system columns.

Building the Schema

131

Configuring Columns in Tables

Command Buttons in the Column Editor
The Properties pane in the Column Editor contains the following command buttons:
Button Name Add Delete Move Up Move Down Import Description Add new columns. For more information, see “Adding Columns” on page 134. Remove existing columns. For more information, see “Deleting Columns” on page 139. Move the selected column up in the display order. For more information, see “Changing the Column Display Order” on page 139. Move the selected column down in the display order. For more information, see “Changing the Column Display Order” on page 139. Add new columns by importing column definitions from another table. For more information, see “Importing Column Definitions From Another Table” on page 135.

Expand View Expand the table columns view. For more information, see “Expanding the Table Columns View” on page 133. Restore View Restore the table columns view. For more information, see “Expanding the Table Columns View” on page 133. Save Saves changes to the column definitions.

Showing or Hiding System Columns
You can toggle the Show System Columns check box to show or hide system columns. For more information, see “Types of Columns in ORS Tables” on page 126.

132 Siperian Hub Administrator Guide

Configuring Columns in Tables

Expanding the Table Columns View
You can expand the properties pane to display all the column properties in a single pane. By default, the Schema Manager displays column definitions in a contracted view.

To show the expanded table columns view: button. • Click the The Schema Manager displays the expanded table columns view.

To show the default table columns view: • Click the button The Schema Manager displays the default table columns view.

Building the Schema

133

Configuring Columns in Tables

Adding Columns
To add a column: 1. Navigate to the column editor for the table that you want to configure. For more information, see “Navigating to the Column Editor” on page 131.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the button. The Schema Manager displays an empty row.

3.

4.

For each column, specify its properties. For more information, see “Column Properties” on page 127. Click the button to save the columns you have added.

5.

134 Siperian Hub Administrator Guide

Configuring Columns in Tables

Importing Column Definitions From Another Table
To import some of the column definitions from another table: 1. Navigate to the column editor for the table that you want to configure. For more information, see “Navigating to the Column Editor” on page 131.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Import Schema button. The Import Schema dialog is displayed.

3.

4.

Specify the connection properties for the schema that you want to import. If you need more information about the connection information to specify here, contact your database administrator. The settings for the User name / Password fields depend on whether proxy users are configured for your Siperian Hub implementation. • • If proxy users are not configured (the default), then the user name will be the same as the schema name. If proxy users are configured, then you must specify the custom user name / password so that Siperian Hub can use those credentials to access the schema.

Building the Schema

135

Configuring Columns in Tables

For more information about proxy user support, see the Siperian Hub Installation Guide for your platform.
5.

Click Next. Note: The database you enter does not need to be the same as the Siperian ORS that you’re currently working in, nor does it need to be a Siperian ORS. The only restriction is that you cannot import from a relational database that is a different type from the one in which you are currently working. For example, if your database is an Oracle database, then you can import columns only from another Oracle database. The Schema Manager displays a list of the tables that are available for import.

6. 7.

Select that table that you want to import. Click Next.

136 Siperian Hub Administrator Guide

Configuring Columns in Tables

The Schema Manager displays a list of columns for the selected table.

8. 9. 10.

Select the column(s) you want to import. Click Finish. Click the Save button to save the column(s) that you have added.

Editing Column Properties
Once columns have been added and saved, you can change certain column properties. Before you make any changes, however, bear in mind that once a table has been defined and saved, you cannot: • reduce the length of a CHAR, VARCHAR, NCHAR, or NVARCHAR2 field • change the scale or precision of a NUMBER field

Important: As with any schema changes that are attempted after the tables have been populated with data, manage changes to columns in a planned and controlled fashion, and ensure that the appropriate database backups are done before making changes. To change column properties:

Building the Schema

137

Configuring Columns in Tables

1.

Navigate to the column editor for the table that you want to configure. For more information, see “Navigating to the Column Editor” on page 131 Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. For each column, you can change the following properties. Be sure to read about the implications of changing a property before you make the change. For more information about each property, see “Column Properties” on page 127.
Property Display Name Length Default Trust Notes for Editing Values in This Column Name for this column as it will be displayed in the Hub Console. You can only increase the length of a CHAR, VARCHAR, NCHAR, or NVARCHAR2 field. Used if no value is provided for the column but the column cannot be null. Note: You need to synchronize metadata if you enable trust. If you enable trust for a column on a table that already contains data, you will be warned that your trust settings have changed and that you need to run the trust Synchronization batch job in the Batch Viewer tool before doing any further loads to the table (see “Running Synchronize Batch Jobs After Changes to Trust Settings” on page 467). Siperian Hub will automatically make sure that the Synchronization job is available in the Batch Viewer tool. For more information, see Chapter 17, “Using Batch Jobs”. Warning: You must execute the synchronization process before you run any more Load jobs. Otherwise, the trusted values used to populate the column will be incorrect. Warning: Beware and be very careful about disabling (unchecking) trust for columns that already contain data. Disabling trust results in the removal of columns from some of the underlying metadata tables and the resultant loss of data. If you inadvertently disable trust and save that change, you should correct your error by enabling trust again and immediately running the Synchronization job to recreate the metadata. Unique Enabling the Unique indicator will fail if the column already contains duplicate values. As noted before, it is recommended that you avoid using the Unique option, particularly on base objects that might be merged.

2.

3.

138 Siperian Hub Administrator Guide

Configuring Columns in Tables

Property Validate

Notes for Editing Values in This Column Warning: Beware when disabling validation, which results in the loss of metadata for the associated column. This should be approached with caution and should only be done with certainty.

4.

Click the

button to save your changes.

Changing the Column Display Order
You can move columns up or down in the display order. Changing the display order does not affect the physical table in the database. To change the column display order: 1. Navigate to the column editor for the table that you want to configure. For more information, see “Navigating to the Column Editor” on page 131
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the column that you want to move. Do one of the following: • • Click the Click the button to move the selected column up in the display order. button to move the selected column down in the display order. button to save your changes.

3. 4.

5.

Click the

Deleting Columns
Removing columns should be approached with extreme caution. Any data that has already been loaded into a column will be lost when the column is removed. It can also be a slow process due to the number of underlying tables that could be affected. You must save the changes immediately after removing the existing columns. To delete a column from base objects, dependent objects, and landing tables: 1. Navigate to the column editor for the table that you want to configure. For more information, see “Navigating to the Column Editor” on page 131

Building the Schema

139

Configuring Foreign-Key Relationships Between Base Objects

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Scroll the column definitions in the Properties pane and select a column that you want to delete. Click the Click Yes. The Schema Manager removes the deleted column definition from the list. Click the button to save your changes. button. The Schema Manager prompts you to confirm deletion.

3.

4.

5.

6.

Configuring Foreign-Key Relationships Between Base Objects
This section describes how to configure foreign key relationships between base objects in your Siperian Hub implementation. For a general overview of foreign key relationships, see “Process Overview for Defining Foreign-Key Relationships” on page 143. For more information about parent-child relationships, see “Configuring Match Paths for Related Records” on page 497.

About Foreign Key Relationships
In Siperian Hub, a foreign key relationship establishes an association between two base objects via matching columns. In a foreign-key relationship, one base object (the child) contains a foreign key column, which contains values that match values in the primary key column of another base object (the parent).

140 Siperian Hub Administrator Guide

Configuring Foreign-Key Relationships Between Base Objects

Types of Foreign Key Relationships in ORS Tables
There are two types of foreign-key relationships in Hub Store tables.
Type system foreign key relationships user-defined foreign key relations Description Automatically defined and enforced by Siperian Hub to protect the referential integrity of your schema. Custom foreign key relationships that are manually defined according to the instructions later in this section.

Foreign Key Relationships and Dependent Objects
Foreign-key relationships are implicit between a dependent object and its parent base object. This relationship is defined according to the instructions in “Configuring Dependent Objects” on page 117.

Building the Schema

141

Configuring Foreign-Key Relationships Between Base Objects

Parent and Child Base Objects
The following diagram shows a foreign key relationship between parent and child base objects. The foreign key column in the child base object points to the ROWID_ OBJECT column in the parent base object.

Create the child table. For more information, see “Deleting Base Objects” on page 116. Define the foreign key relationship between them according to the instructions in “Adding Foreign-Key Relationships” on page 143.

3.

If the child table contains generated keys from the parent table, the load process copies the appropriate primary key value from the parent table into the child table.

Adding Foreign-Key Relationships
To add a foreign-key relationship between two base objects: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand a base object (the base object that will be the child in the relationship). Right-click Relationships. The Schema Manager displays the Properties tab of the Relationships page.

3.

4.

Building the Schema

143

Configuring Foreign-Key Relationships Between Base Objects

5.

Click the

button.

The Schema Manager displays the Add Relationship dialog.

6.

Define the new relationship by selecting: • • a column in the Relate from tree, and a column in the Relate to tree

7.

If you want, check (select) the Virtual relationship check box to create a foreign key relationship that is not enforced by the database. Metadata is defined in the ORS that an implicit relationship exists. Note: You cannot select a display column for foreign key relationships that Siperian Hub automatically creates.

8.

Click OK.

144 Siperian Hub Administrator Guide

Configuring Foreign-Key Relationships Between Base Objects

9.

Click the Diagram tab to view the foreign-key relationship diagram.

10.

Click the

button to save your changes.

Note: After you have created a relationship, if you go back and try to create another relationship, the column is not displayed because it is in use. When you delete the relationship, the column will be displayed.

Editing Foreign-Key Relationships
You can change only the Lookup Display Name in a foreign key relationship. To change any other properties, you need to delete the relationship, add it again, and specify the properties you want. To edit the lookup display name for a foreign-key relationship between two base objects: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand a base object and right-click Relationships.

3.

Building the Schema

145

Configuring Foreign-Key Relationships Between Base Objects

The Schema Manager displays the Properties tab of the Relationships page.

4.

On the Properties tab, click the foreign-key relationship whose properties you want to view. The Schema Manager displays the relationship details.

5.

Click the value. Click the

Edit button next to the Lookup Display Name and specify the new button to save your changes.

6.

146 Siperian Hub Administrator Guide

Configuring Foreign-Key Relationships Between Base Objects

Configuring Lookups for Foreign-Key Relationships
After you have created a foreign key relationship, you can configure a lookup for the column. A lookup causes Siperian Hub to retrieve a data value from a parent table during the stage process. For example, if an Address staging table includes a CONSUMER_CODE_FK column, you could have Siperian Hub perform a lookup to the ROWID_OBJECT column in the Consumer base object and retrieve the ROWID_ OBJECT value of the associated parent record in the Consumer table. For more information, see “Configuring Lookups For Foreign Key Columns” on page 376.

Deleting Foreign-Key Relationships
You can delete any user-defined foreign-key relationship that has been added according to the instructions in “Adding Foreign-Key Relationships” on page 143. You cannot delete the system foreign key relationships that Siperian Hub automatically defines and enforces to protect the referential integrity of your schema. To delete a foreign-key relationship between two base objects: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand a base object and right-click Relationships. On the Properties tab, click the foreign-key relationship that you want to delete. Click the Click Yes. The Schema Manager deletes the foreign key relationship. Click the button to save your changes. button. The Schema Manager prompts you to confirm deletion.

3. 4. 5.

6.

7.

Building the Schema

147

Viewing Your Schema

Viewing Your Schema
You can use the Schema Viewer tool in the Hub Console to visualize the schema in an ORS. The Schema Viewer is particularly helpful for visualizing a complex schema.

Starting the Schema Viewer
Note: The Schema Viewer can also be launched from within the Metadata Manager, as described in the Siperian Hub Metadata Manager Guide. Once started, however, the instructions for using the Schema Viewer are the same, regardless of where it was launched from. To start the Schema Viewer tool: • In the Hub Console, expand the Model workbench, and then click Schema Viewer. The Hub Console starts the Schema Viewer and loads the data model, showing a progress dialog.

148 Siperian Hub Administrator Guide

Viewing Your Schema

The Hub Console displays the Schema Viewer tool, as shown in the following example.

Diagram Pane

Overview Pane

Panes in the Schema Viewer
The Schema Viewer is divided into two panes.
Pane Diagram pane Overview pane Description Shows a detailed diagram of your schema. Shows an abstract overview of your schema. The gray box highlights the portion of the overall schema diagram that is currently displayed in the diagram pane. Drag the gray box to move the display area over a particular portion of your schema.

Building the Schema

149

Viewing Your Schema

Command Buttons in the Schema Viewer
The Diagram Pane in the Schema Viewer contains the following command buttons:
Button Name Zoom In Zoom Out Zoom All Layout Options Description Zooms in and magnifies a smaller area of the schema diagram, as described in “Zooming In” on page 150. Zooms out and displays a larger area of the schema diagram, as described in “Zooming Out” on page 151. Zooms out to displays the entire schema diagram, as described in “Zooming All” on page 152. Toggles between a hierarchic and orthogonal view, as described in “Switching Views of the Schema Diagram” on page 152. Shows or hides column names and controls the orientation of the hierarchic view, as described in “Configuring Schema Viewer Options” on page 156. Saves the schema diagram as a JPG file, as described in “Saving the Schema Diagram as a JPG Image” on page 157. Prints the schema diagram, as described in “Printing the Schema Diagram” on page 158.

Save Print

Zooming In and Out of the Schema Diagram
You can zoom in and out of the schema diagram.

Zooming In
To zoom into a portion of the schema diagram: • Click the button.

150 Siperian Hub Administrator Guide

Viewing Your Schema

The Schema Viewer magnifies a portion of the screen.

Note that the gray highlight box in the Overview Pane has grown smaller to indicate the portion of the schema that is displayed in the diagram pane.

Zooming Out
To zoom out of the schema diagram: • Click the button. The Schema Viewer zooms out of the schema diagram. Note that the gray box in the Overview Pane has grown larger to indicate a larger viewing area.

Building the Schema

151

Viewing Your Schema

Zooming All
To zoom all of the schema diagram, which means that the entire schema diagram is displayed in the Diagram Pane: • Click the button. The Schema Viewer zooms out to display the entire schema diagram.

Switching Views of the Schema Diagram
The Schema Viewer displays the schema diagram in two different views.

152 Siperian Hub Administrator Guide

Viewing Your Schema

Hierarchic View
The following figure shows an example of the hierarchic view (the default).

Building the Schema

153

Viewing Your Schema

Orthogonal View
The following figure shows the same schema in the orthogonal view.

Toggling Views
To switch between the hierarchic and orthogonal views: • Click the Layout button. The Schema Viewer displays the other view.

154 Siperian Hub Administrator Guide

Viewing Your Schema

Navigating to Related Design Objects and Batch Jobs
Right-clicking on an object in the Schema Viewer displays a context menu.

The context menu displays the following commands.
Command Description

Go to BaseObject Launches the Schema Manager and displays this base object with an expanded base object node. Go to Staging Table Go to Mapping Go to Job Go to Batch Groups Launches the Schema Manager and displays the selected staging table under the associated base object. Launches the Mappings tool and displays the properties for the selected mapping. Launches the Batch Viewer and displays the properties for the selected batch job. Launches the Batch Group tool.

Controls the orientation of the schema hierarchy. One of the following values: • • • • Top to Bottom (default)—Hierarchy goes from top to bottom, with the highest-level node at the top. Bottom to Top—Hierarchy goes from bottom to top, with the highest-level node at the bottom. Left to Right—Hierarchy goes from left to right, with the highest-level node at the left. Right to Left—Hierarchy goes from right to left, with the highest-level node at the right.

156 Siperian Hub Administrator Guide

Viewing Your Schema

In the following example, column names are hidden.

3.

Click OK.

Saving the Schema Diagram as a JPG Image
To save the schema diagram as a JPG image: 1. Click the button.

Building the Schema

157

Viewing Your Schema

The Schema Viewer displays the Save dialog.

2. 3. 4.

Navigate to the location on the file system where you want to save the JPG file. Specify a descriptive name for the JPG file. Click Save. The Schema Viewer saves the file.

Printing the Schema Diagram
To print the schema diagram: 1. Click the button. The Schema Viewer displays the Print dialog.

158 Siperian Hub Administrator Guide

Viewing Your Schema

2.

Select the print options that you want.
Pane Print Area Description Scope of what to print: • • Page Settings Printer Settings Print All—Print the entire schema diagram. Print viewable—Print only the portion of the schema diagram that is currently visible in the Diagram Pane.

Page output options, such as media, orientation, and margins. Printer options based on available printers in your environment.

3.

Click Print. The Schema Viewer sends the schema diagram to the printer.

Building the Schema

159

Viewing Your Schema

160 Siperian Hub Administrator Guide

6
Configuring Queries and Packages
This chapter describes how to configure Siperian Hub to provide queries and packages that data stewards and applications can use to access data in the Hub Store.

Before You Begin
Before you begin to define queries and packages, you must have: • installed Siperian Hub and created the Hub Store according to the instructions in Siperian Hub Installation Guide for your platform • built the schema according to the instructions Chapter 5, “Building the Schema”

161

About Queries and Packages

About Queries and Packages
In Siperian Hub, a query is a request to retrieve data from the Hub Store. A package is a public view of one or more underlying tables in Siperian Hub. A package is based on a query, which can select records from a table or from another package. Queries and packages go together. Queries define the criteria for selecting data, and packages are views that users use to operate on that data. A query can be used in multiple packages. For more information, see: • “Configuring Queries” on page 162 • “Configuring Packages” on page 196

Configuring Queries
This section describes how to create and modify queries using the Queries tool in the Hub Console. The Queries tool allows you to create simple, advanced, and custom queries.

About Queries
In Siperian Hub, a query is a request to retrieve data from the Hub Store. Just like any SQL-based query statement, Siperian Hub queries allow you to specify, via the Hub Console, the criteria used to retrieve that data—tables and columns to include, conditions for filtering records, and sorting and grouping the results. Queries that you save in the Queries tool can be used in packages, and data stewards can use them in the Data Manager and Merge Manager tools.

Query Capabilities
You can define a query to: • return selected columns • • • filter the result set with a WHERE clause use complex query syntax, such as GROUP BY, ORDER BY, and HAVING clauses use aggregate functions, such as SUM, COUNT, and AVG

162 Siperian Hub Administrator Guide

Configuring Queries

Types of Queries
You can create the following types of queries:
Type query Description Created by selecting tables and columns, and configuring query conditions, sort by, and group by options, according to the instructions in “Configuring Queries” on page 166. Created by specifying a SQL statement according to the instructions in “Configuring Custom Queries” on page 190.

custom query

How Schema Changes Affect Queries
Queries are dependent on the base object columns from which they retrieve data. If changes are made to the column configuration in the base object associated with a query, then the queries—including custom queries—are updated automatically. For example, if a column is renamed, then the name is updated in any dependent queries. If a column is deleted in the base object, then the consequences depend on the type of query: • For a custom query, the query becomes invalid and must be manually fixed in the Queries tool or the Packages tool. Otherwise, if executed, an invalid query will return an error. • For all other queries, the column is removed from the query, as well as from any packages that depend on the query.

Configuring Queries and Packages 163

Configuring Queries

Starting the Queries Tool
To start the Queries tool: • Expand the Model workbench and then click Queries. The Hub Console displays the Queries tool, as shown in the following example.

Navigation Pane

Properties Pane

The Queries tool is divided into two panes:
Pane navigation pane properties pane Description Displays a hierarchical list of configured queries and query groups. Displays the properties of the selected query or query group.

About Query Groups
A query group is a logical group of queries. A query group is simply a mechanism for organizing queries in the Queries tool.

164 Siperian Hub Administrator Guide

Configuring Queries

Adding Query Groups
To add a query group: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click in the navigation pane and choose New Query Group. The Queries tool displays the Add Query Group window.

3.

4. 5. 6.

Enter a descriptive name for this query group. Enter a description for this query group. Click OK. The Queries tool adds the new query group to the tree.

Editing Query Group Properties
To edit query group properties: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Configuring Queries and Packages 165

Configuring Queries

3. 4.

In the navigation pane, select the query group that you want to configure. For each property that you want to edit, click the specify the new value. Click the Save button to save your changes. Edit button next to it, and

5.

Deleting Query Groups
You can delete an empty query group but not a query group that contains queries. To delete a query group: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, right-click the empty query group that you want to delete, and choose Delete Query Group. The Queries tool prompts you to confirm deletion. Click Yes.

3.

4.

Configuring Queries
This section describes how to configure queries.

Adding Queries
To add a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the query group to which you want to add the query. Right-click in the Queries pane and choose New Query.

3. 4.

166 Siperian Hub Administrator Guide

Configuring Queries

The Queries tool displays the New Query Wizard.
5.

If you see a Welcome screen, click Next.

6.

Specify the following query properties:
Property Query name Description Query Group Select primary table Description Descriptive name for this query. Option description of this query. Select the query group to which this query belongs. Primary table from which this query retrieves data.

7.

Do one of the following: • • If you want the query to retrieve all columns and all records from the primary table, click Finish to complete the process of creating the query. If you want to specify selection criteria, click Next and continue.

Configuring Queries and Packages 167

Configuring Queries

The Queries tool displays the Select query columns window.

8.

Select the query columns from which you want the query to retrieve data. Note: PUT-enabled packages require the Rowid Object column in the query. Click Finish. The Queries tool adds the new query to the tree. Refine the query criteria by proceeding to the instructions in “Editing Query Properties” on page 168.

9.

10.

Editing Query Properties
Once you have created a query, you can modify its properties to refine the criteria it uses to retrieve data from the ORS.

168 Siperian Hub Administrator Guide

Configuring Queries

To modify the query properties: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation tree, select the query that you want to modify. The current query properties are displayed in the properties pane.

3.

The properties pane displays the following set of tabs:
Tab Tables Description Tables associated with this query. Corresponds to the SQL FROM clause. For more information, see “Configuring the Table(s) in a Query” on page 170.

Configuring Queries and Packages 169

Configuring Queries

Tab Select

Description Columns associated with this query. Corresponds to the SQL SELECT clause. For more information, see “Configuring the Column(s) in a Query” on page 174. Conditions associated with this query. Determines selection criteria for individual records. Corresponds to the SQL WHERE clause. For more information, see “Configuring Conditions for Selecting Records of Data” on page 178. Sort order for the results of this query. Corresponds to the SQL ORDER BY clause. For more information, see “Specifying the Sort Order for Query Results” on page 183. Grouping for the results of this query. Corresponds to the SQL GROUP BY clause. “Specifying the Grouping for Query Results” on page 186. Displays the SQL associated with the selected query settings. “Viewing the SQL for a Query” on page 190.

Conditions

Sort

Grouping SQL

4. 5.

Make the changes you want. Click the Save button. The Queries tool validates your query settings and prompts you if it finds errors.

Configuring the Table(s) in a Query
The Tables tab displays the table(s) from which the query will retrieve information. The information in this tab corresponds to the SQL FROM clause. Adding a Table to a Query To add a table to a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Tables tab. Click the button.

3. 4.

170 Siperian Hub Administrator Guide

Configuring Queries

The Queries tool prompts you to select the table you want to add.

5.

Select a table and then click OK. If one or more other tables exist on the Tables tab, the Queries tool might prompt you to select a foreign key relationship between the table you just added and another table, as shown in the following example.

6.

If prompted, select a foreign key relationship (if you want), and then click OK.

Configuring Queries and Packages 171

Configuring Queries

The Queries tool displays the added table in the Tables tab.

For multiple tables, the Queries tool displays all added tables in the Tables tab.

Foreign Key Relationship

Join Type

172 Siperian Hub Administrator Guide

Configuring Queries

If you specified a foreign key between tables, the corresponding key columns are linked. Also, if tables are linked by foreign key relationships, then the Queries tool allows you to select the type of join for this query.

7.

Click the

Save button.

Deleting a Table from a Query A query must have multiple tables in order for you to remove a table. You cannot remove the last table in a query. To remove a table from a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Tables tab. Select the table that you want to delete. Click the Click the button. Save button. The Queries tool removes the selected table from the query.

3. 4. 5.

6.

Configuring Queries and Packages 173

Configuring Queries

Configuring the Column(s) in a Query
The Select tab displays the list of column(s) in one or more source tables from which the query will retrieve information, as shown in the following example. The information in this tab corresponds to the SQL SELECT clause.

Adding Table Column(s) to a Query To add a table column to a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Select tab. Click the button.

3. 4.

174 Siperian Hub Administrator Guide

Configuring Queries

The Queries tool prompts you to select from a list of one or more tables.

5.

Expand the list for the table containing the column that you want to add. The Queries tool displays the list of columns for the selected table.

E

6. 7.

Select the column(s) you want to include in the query. Click OK.

Configuring Queries and Packages 175

Configuring Queries

The Queries tool adds the selected column(s) to the list of columns on the Select tab.
8.

Click the

Save button.

Removing Table Column(s) from a Query To remove a table column from the query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Select tab. Select one or more column(s) that you want to remove. Click the Click the button. Save button. The Queries tool removes the selected column(s) from the query.

3. 4. 5.

6.

Changing the Column Order To change the order in which the columns will appear in the result set (if the list contains multiple columns): 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Select tab. Select one column that you want to move. Do one of the following: • • To move the selected column up the list, click the To move the selected column up the list, click the button. button.

3. 4. 5.

The Queries tool moves the selected column up or down.

176 Siperian Hub Administrator Guide

Configuring Queries

6.

Click the

Save button.

Adding Functions You can add aggregate functions to your queries (such as COUNT, MIN, or MAX). At run time, these aggregate functions appear in the usual syntax for the SQL statement used to execute the query—such as:
select col1, count(col2) as c1 from table_name group by col1

To add a function to a table column: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Select tab. Click the button. The Queries tool prompts you to select the function you want to add.

3. 4.

E

5. 6. 7. 8.

If you want, select a different column. Select the function that you want to use on the selected column. Click OK. Click the Save button.

Configuring Queries and Packages 177

Configuring Queries

Adding Constants To add a constant to a table column: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Select tab. Click the button. The Queries tool prompts you to specify the constant that you want to add.

3. 4.

E

5. 6. 7. 8.

Select the data type from the list. Enter a value that is compatible with the selected data type. Click OK. Click the Save button.

Configuring Conditions for Selecting Records of Data
The Conditions tab displays a list of condition(s) that the query will use to select records from the table. A comparison is a query condition that involves one column, one

178 Siperian Hub Administrator Guide

Configuring Queries

operator, and either another column or a constant value. The information in this tab corresponds to the SQL WHERE clause.

Operators For an operator, you can select one of the following values.
Operator = <> IS IS NOT LIKE Description Equals. Does not equal. NULL NULL Value in the comparison column must be like the search value (includes column values that match the search value). For example, if the search value is %JO% for the last_name column, then the parameter will match column values like “Johnson”, “Vallejo”, “Major”, and so on. Value in the comparison column must not be like the search value (excludes column values that match the search value). For example, if the search value is %JO% for the last_name column, then the parameter will omit column values like “Johnson”, “Vallejo”, “Major”, and so on. Less than. Less than or equal to.

NOT LIKE

< <=

Configuring Queries and Packages 179

Configuring Queries

Operator > >=

Description Greater than. Greater than or equal to.

Adding a Comparison To add a comparison to this query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Conditions tab. Click the button. The Queries tool prompts you to add a comparison.

3. 4.

E

5. 6. 7.

If you want, select a different column. Select the operator that you want to use on the selected column. Select the type of comparison (Constant or Column). • If you select Column, then select a column from the Edit Column drop-down list.

180 Siperian Hub Administrator Guide

Configuring Queries

•

If you selected Constant, then click the you want to add, and then click OK.

button, specify the constant that

8.

Click OK. The Queries tool adds the comparison to the list on the Conditions tab. Click the Save button.

9.

Editing a Comparison To edit a comparison in this query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Conditions tab. Select the comparison that you want to edit. Click the Edit button.

3. 4. 5.

Configuring Queries and Packages 181

Configuring Queries

The Queries tool prompts you to edit the comparison.

E

6.

Change the settings you want according to the instructions in “Adding a Comparison” on page 180. Click OK. The Queries tool updates the comparison in the list on the Conditions tab. Click the Save button.

7.

8.

Removing a Comparison To remove a comparison from this query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Conditions tab. Select the comparison that you want to remove. Click the Click the button. Save button. The Queries tool removes the selected comparison from the query.

3. 4. 5.

6.

182 Siperian Hub Administrator Guide

Configuring Queries

Specifying the Sort Order for Query Results
The Sort By tab displays a list of column(s) containing the values that the query will use to sort the query results at run time. The information in this tab corresponds to the SQL ORDER BY clause.

Selecting the Sort Columns To select the sort columns in this query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Sort tab. Click the button.

3. 4.

Configuring Queries and Packages 183

Configuring Queries

The Queries tool prompts you to select sort columns.

5.

Expand the list for the table containing the column(s) that you want to select for sorting. The Queries tool displays the list of columns for the selected table.

6.

Select the column(s) you want to use for sorting.

184 Siperian Hub Administrator Guide

Configuring Queries

7.

Click OK. The Queries tool adds the selected column(s) to the list of columns on the Sort By tab.

8.

Do one of the following: • • Enable (check) the Ascending check box to sort records in ascending order for the specified column. Disable (uncheck) the Ascending check box to sort records in descending order for the specified column. Save button.

9.

Click the

Removing Table Column(s) from a Sort Order To remove a table column from the sort by list: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Sort tab. Select one or more column(s) that you want to remove. Click the Click the button. Save button. The Queries tool removes the selected column(s) from the sort by list.

3. 4. 5.

6.

Changing the Column Order To change the order in which the columns will appear in the result set (if the list contains multiple columns): 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Configuring Queries and Packages 185

Configuring Queries

3. 4. 5.

Click the Sort tab. Select one column that you want to move. Do one of the following: • • To move the selected column up the list, click the To move the selected column up the list, click the Save button. button. button.

The Queries tool moves the selected column up or down a record.
6.

Click the

Specifying the Grouping for Query Results
The Grouping tab displays a list of column(s) containing the values that the query will use for grouping the query results at run time. The information in this tab corresponds to the SQL GROUP BY clause.

Selecting the Grouping Columns To select the grouping columns in this query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.

186 Siperian Hub Administrator Guide

Configuring Queries

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Grouping tab. Click the button. The Queries tool prompts you to select grouping columns.

3. 4.

5.

Expand the list for the table containing the column(s) that you want to select for grouping.

Configuring Queries and Packages 187

Configuring Queries

The Queries tool displays the list of columns for the selected table.

6. 7.

Select the column(s) you want to use for grouping. Click OK. The Queries tool adds the selected column(s) to the list of columns on the Grouping tab.

8.

Click the

Save button.

188 Siperian Hub Administrator Guide

Configuring Queries

Removing Table Column(s) from a Grouping Order To remove a table column from the grouping list: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Grouping tab. Select one or more column(s) that you want to remove. Click the .Click the button. Save button. The Queries tool removes the selected column(s) from the grouping list.

3. 4. 5.

6.

Changing the Column Order To change the order in which the columns will be grouped in the result set (if the list contains multiple columns): 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Grouping tab. Select one column that you want to move. Do one of the following: • • To move the selected column up the list, click the To move the selected column up the list, click the Save button. button. button.

3. 4. 5.

The Queries tool moves the selected column up or down a record.
6.

Click the

Configuring Queries and Packages 189

Configuring Queries

Viewing the SQL for a Query
The SQL tab displays the SQL statement that corresponds to the query options you have specified for the selected query, as shown in the following example.

Configuring Custom Queries
This section describes how to configure custom queries in the Queries tool.

About Custom Queries
A custom query is simply a query for which you supply the SQL statement directly, rather than building it according to the instructions in “Configuring Queries” on page 166. Custom queries can be used in packages and in the data steward tools.

Adding Custom Queries
To add a custom query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

190 Siperian Hub Administrator Guide

Configuring Queries

3. 4.

Select the query group to which you want to add the query. Right-click in the Queries pane and choose New Custom Query. The Queries tool displays the New Custom Query Wizard. If you see a Welcome screen, click Next.

5.

6.

Specify the following custom query properties:
Property Query name Description Query Group Description Descriptive name for this query. Option description of this query. Select the query group to which this query belongs.

7.

Click Finish.

Configuring Queries and Packages 191

Configuring Queries

The Queries tool displays the newly-added custom query.

8. 9. 10.

Click the Click the

Edit button next to the SQL field. Save button.

Enter the SQL query according to the syntax rules for your database platform. If an error occurs when the query is submitted to the database, then the Queries tool displays the database error message, as shown in the following example.

Fix any errors and save your changes.

192 Siperian Hub Administrator Guide

Configuring Queries

Editing a Custom Query
Once you have created a custom query, you can modify its properties to refine the criteria it uses to retrieve data from the ORS. To modify the custom query properties: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation tree, select the custom query that you want to modify. Edit the property settings that you want to change, clicking the Edit button next to the field if applicable. Click the Save button. The Queries tool validates your query settings and prompts you if it finds errors.

3. 4.

5.

Deleting a Custom Query
You delete a custom query in the same way in which you delete a regular query. For more information, see “Removing Queries” on page 195.

Viewing the Results of Your Query
To view the results of your query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2. 3.

In the navigation tree, expand the query for which you want to view the results. Click View.

Configuring Queries and Packages 193

Configuring Queries

The Queries tool displays the results of your query, as shown in the following example.

Viewing the Query Impact Analysis
The Queries tool allows you to view the packages based on a given query, along with any tables and columns used by the query. To view the impact analysis of a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2. 3.

Expand the query group associated with the query you want to select. Right click the query and choose Impact Analysis from the pop-up menu.

194 Siperian Hub Administrator Guide

Configuring Queries

4.

The Queries tool displays the Impact Analysis dialog.

5.

Expand the list next to a table to display the columns associated with the query, if you want. Click Close.

6.

Removing Queries
If a query has multiple packages based on it, remove those packages first before attempting to remove the query. To remove a query: 1. In the Hub Console, start the Queries tool according to the instructions in “Starting the Queries Tool” on page 164.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the query group associated with the query you want to remove. Select the query you want to remove. Right click the query and choose Delete Query from the pop-up menu. The Queries tool prompts you to confirm deletion. Click Yes. The Queries tool removes the query from the list.

3. 4. 5.

6.

Configuring Queries and Packages 195

Configuring Packages

Configuring Packages
This section describes how to create and modify PUT and display packages. You use the Packages tool in the Hub Console to define packages.

About Packages
A package is a public view of one or more underlying tables in Siperian Hub. Packages represent subsets of the columns in those tables, along with any other tables that are joined to the tables. A package is based on a query. The underlying query can select a subset of records from the table or from another package. For more information, see “Configuring Queries” on page 162.

What Packages Are Used For
Packages are used for: • defining user views of the underlying data • updating data via the Hub Console or applications that invoke Services Integration Framework (SIF) requests. Some—but not all of the—SIF requests use packages. For more information, see the Siperian Services Integration Framework Guide.

How Packages Are Used
Packages are used in the following ways: • The Siperian Hub security model uses packages to control access to data for third-party applications that access Siperian Hub functionality and resources using the Services Integration Framework (SIF). To learn more, see “About Setting Up Security” on page 832 and the Siperian Services Integration Framework Guide. • The Merge Manager and Data Manager tools use packages to determine the ways in which data stewards can view data. For more information, see the Siperian Hub Data Steward Guide. Hierarchy Manager uses packages. For more information, see the Chapter 8, “Configuring Hierarchies,” and “Using the Hierarchy Manager” in Siperian Hub Data Steward Guide.

•

196 Siperian Hub Administrator Guide

Configuring Packages

Packages and SECURE Resources
Packages are configured as either SECURE or PRIVATE resources. For more information, see “Securing Siperian Hub Resources” on page 841.

When to Create a Package
You must create a package if you want your Siperian Hub implementation to: • Merge and update records in the Hub Store using the Merge Manager and Data Manager tools. For more information, see the Siperian Hub Data Steward Guide. • Allow an external application user to access Siperian Hub functionality using Services Integration Framework (SIF) requests. For more information, see the Siperian Services Integration Framework Guide.

In most cases, you create one set of packages for the Merge Manager and Data Manager tools, and a different set of packages for external application users.

PUT-Enabled and Display Packages
There are two types of packages: • PUT-enabled packages can be used to update data. • Display packages cannot be used to update data.

You must use PUT-enabled packages when you: • execute the SIF put request, which inserts or updates records • use the Merge Manager and Data Manager tools

PUT-enabled packages: • cannot include joins to other tables • • cannot be based on system tables or other packages cannot be based on queries that have constant columns, aggregate functions, or group by settings

Configuring Queries and Packages 197

Configuring Packages

Note: In the Merge Manager Setup screen, a PUT-enabled package is referred to as a merge package. The Merge Manager also allows you to choose a display package.

Starting the Packages Tool
To start the Packages tool: 1. Select the Packages tool in the Model workbench. The Packages tool is displayed.
2.

Select a package in the list. The Packages tool displays properties for the selected package.

Navigation Pane

Properties Pane

198 Siperian Hub Administrator Guide

Configuring Packages

The Packages tool is divided into two panes:
Pane navigation pane properties pane Description Displays a hierarchical list of configured packages. Displays the properties of the selected package.

Adding Packages
To add a new package: 1. In the Hub Console, start the Packages tool according to the instructions in “Starting the Packages Tool” on page 198.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click in the Packages pane and choose New Package. The Packages tool displays the New Package Wizard. Note: If the welcome screen is displayed, click Next.

3.

Configuring Queries and Packages 199

Configuring Packages

4.

Specify the following information.
Field Display Name Physical Name Description Name of this package as it will be displayed in the Hub Console. Actual name of the package in the database. Siperian Hub will suggest a physical name for the package based on the display name that you enter. Description of this package. To create a PUT package, check (select) to insert or update records into base object tables. Note: Every package that you use for merging data or updating data must be PUT-enabled. If you do not enable PUT, you create a display (read-only) package. Secure Resource Check (enable) to make this package a secure resource, which allows you to control access to this package. Once a package is designated as a secure resource, you can assign privileges to it in the Roles tool. For more information, see “Securing Siperian Hub Resources” on page 841, and “Assigning Resource Privileges to Roles” on page 859.

Description Enable PUT

5.

Click Next.

200 Siperian Hub Administrator Guide

Configuring Packages

The New Package Wizard displays the Select Query dialog.

6.

If you want, click New Query Group to add a new query group, as described in “Configuring Query Groups” on page 164. If you want, click New Query to add a new query, as described in “Configuring Queries” on page 166. Select a query. Note: For PUT-enabled packages: • • only queries with ROWID_OBJECT can be used custom queries cannot be used

7.

8.

9.

Click Finish. The Packages tool adds the news package to the list.

Modifying Package Properties
To edit the package properties: 1. In the Hub Console, start the Packages tool according to the instructions in “Starting the Packages Tool” on page 198.

Configuring Queries and Packages 201

Configuring Packages

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the package to configure. In the properties panel, change any of the package properties that have an edit button to the right. If you want, expand the package in the packages list. To change the query, select Query beneath the package and modify the query as described in “Editing Query Properties” on page 168.

3. 4.

5. 6.

202 Siperian Hub Administrator Guide

Configuring Packages

7.

To display the package view, select View beneath the package.

Refreshing Packages After Changing Queries
If a query has been changed, then any packages based on that query must be refreshed. To refresh a package: In the Hub Console, start the Packages tool according to the instructions in “Starting the Packages Tool” on page 198. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the package that you want to refresh.

1.

2.

3.

Configuring Queries and Packages 203

Configuring Packages

4.

From the Packages menu, choose Refresh.

Note: If after a refresh the query remains out of synch with the package, then simply check (select) or uncheck (clear) any columns for this query. For more information, see “Configuring the Column(s) in a Query” on page 174.

Specifying Join Queries
You can choose to allows data stewards to view base object information, along with information from the other tables, in the Data Manager or Merge Manager. To expose this information: 1. Create a PUT-enabled base object package.
2. 3.

Create a query to join the PUT-enabled base object package with the other tables. Create a display package based on the query you just created.

Removing Packages
To remove a package: 1. In the Hub Console, start the Packages tool according to the instructions in “Starting the Packages Tool” on page 198.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the package to remove. Right click the package and choose Delete Package. The Packages tool prompts you to confirm deletion. Click Yes. The Packages tool removes the package from the list.

3. 4.

5.

204 Siperian Hub Administrator Guide

7
State Management
This chapter describes how to configure state management in your Siperian Hub implementation.

Chapter Contents
• • • • • • Before You Begin About State Management in Siperian Hub State Transition Rules for State Management Configuring State Management for Base Objects Modifying the State of Records Rules for Loading Data

205

Before You Begin

Before You Begin
Before you begin to use state management, you must have: • installed Siperian Hub and created the Hub Store according to the instructions in Siperian Hub Installation Guide for your platform • built a schema; for more information, see “About the Schema” on page 82.

About State Management in Siperian Hub
Siperian Hub supports workflow tools by storing pre-defined system states for base object and XREF records. By enabling state management on your data, Siperian Hub offers the following additional flexibility: • Allows integration with workflow integration processes and tools • • Supports a “change approval” process Tracks intermediate stages of the process (pending records)

About System States
System state describes how base object records are supported by Siperian Hub. The following table describes the supported system states:
State ACTIVE Description Default state. Record has been reviewed and approved. Active records participate in Hub processes by default. This is a state associated with a base object or cross reference record. A base object record is active if at least one of its cross reference records is active. A cross reference record contributes to the consolidated base object only if it is active. These are the records that are available to participate in any operation. If records are required to go through an approval process, then these records have been through that process and have been approved. Note that Siperian Hub allows matches to and from PENDING and ACTIVE records.

206 Siperian Hub Administrator Guide

About State Management in Siperian Hub

State PENDING

Description Pending records are records that have not yet been approved for general usage in the Hub. These records can have most operations performed on them, but operations have to specifically request pending records. If records are required to go through an approval process, then these records have not yet been approved and are in the midst of an approval process. If there are only pending XREF records, then the Best Version of the Truth (BVT) on the base object is determined through trust on the PENDING records. Note that Siperian Hub allows matches to and from PENDING and ACTIVE records.

DELETED

Deleted records are records that are no longer desired to be part of the Hub’s data. These records are not used in processes (unless specifically requested). Records can only be deleted explicitly and once deleted can be restored if desired. When a record that is pending is deleted, it is physically deleted, does not enter the DELETED state, and cannot be restored. In order for a record to be deleted, it must be in either the ACTIVE state for soft delete or the PENDING state for hard delete. Note that Siperian Hub does not include records in the DELETED state for trust and validation rules.

About the Hub State Indicator
All base objects and cross-reference tables have a system column, HUB_STATE_IND, that indicates the system state for records in those tables. This column contains the following values associated with system states:
System State ACTIVE (Default) PENDING DELETED Value 1 0 -1

State Management 207

About State Management in Siperian Hub

Protecting Pending Records Using the Interaction ID
You can not use the tools in the Hub Console to change the state of a base object or XREF record from PENDING to ACTIVE state if the interaction_ID is set. The Interaction ID column is used to protect a pending XREF record from updates that are not part of the same process as the original XREF record. Use one of the state management SIF API requests, instead. For more information, see Siperian Services Integration Framework Guide. Note: The Interaction ID can be specified through any API; however, it cannot be specified when performing batch processing. For example, records that are protected by an Interaction ID cannot be updated by the Load batch process. The protection provided by interaction IDs is outlined in the following table. Note that in the following table the Version A and Version B examples are used to represent the situations where the incoming and existing interaction ID do and do not match:
Incoming Interaction ID Version A Version B Null Existing Interaction ID Version A OK Error Error Version B Error OK Error Null OK OK OK

State Transition Rules for State Management
This section describes transition rules for state management.

About State Transition Rules
State transition rules determine whether and when a record can change from one state to another. State transition for base object and XREF records can be enabled using the following methods: • Using the Data Manager or Merge Manager tools in the Hub Console; for more information, see Siperian Hub Data Steward Guide.

208 Siperian Hub Administrator Guide

About State Management in Siperian Hub

• •

Promote batch job; for more information, see “Promote Jobs” on page 741. SiperianClient API; for more information, see Siperian Services Integration Framework Guide.

State transition rules differ for base object and cross-reference records.

State Management 209

About State Management in Siperian Hub

Transition Rules for Base Object Records
State ACTIVE Description • • • • DELETED • • • Can transition to DELETED state. Can transition to PENDING state only if the base object record becomes DELETED and a pending XREF record is added. Can transition to ACTIVE state. This transition is called promotion. To learn more, see “Modifying the State of Records” on page 216. Cannot transition to DELETED state. Instead, a PENDING record is physically removed from the Hub. Can transition to ACTIVE state only if XREF records are restored. Cannot transition to PENDING state. Note: In order for a record to be deleted, it must be in either the ACTIVE state for soft delete or the PENDING state for hard delete.

PENDING

Transition Rules for Cross-reference (XREF) Records
State ACTIVE PENDING Description • • • • DELETED • • • Can transition to DELETED state. Cannot transition to PENDING state. Can transition to ACTIVE state. This transition is called promotion. To learn more, see “Modifying the State of Records” on page 216. Cannot transition to DELETED state. Instead, a PENDING record is physically removed from the Hub. Can transition to ACTIVE state. This transition is called restore. Cannot transition to PENDING state. Note: In order for a record to be deleted, it must be in either the ACTIVE state for soft delete or the PENDING state for hard delete.

210 Siperian Hub Administrator Guide

Configuring State Management for Base Objects

Hub States and Base Object Record Value Survivorship
When there are active and pending (or deleted) cross-references together in the same base object record, whether after a merge, put, or load, the values on the base object record reflect only the values from the active cross-reference records. As such: • ACTIVE values always prevail over PENDING and DELETED values. • PENDING values always prevail over DELETED values.

Configuring State Management for Base Objects
You can configure state management for base objects using the Schema tool. How you configure the base object depends on your focus. Once you enable state management for a base object, you can also configure the following options for the base object: • Enable the history of cross-reference promotion; for more information, see “Enabling the History of Cross-Reference Promotion” on page 213 • • Include pending records in the match process; for more information, see “Enabling Match on Pending Records” on page 214 Enable message queue triggers for a state-enabled base object record; for more information, see “Enabling Message Queue Triggers for State Changes” on page 215

Enabling State Management
State management is configured per base object and is disabled by default—it must be explicitly enabled. To enable state management for a base object: 1. Open the Model workbench and click Schema.
2. 3.

In the Schema tool, select the desired base object. Click the Enable State Management checkbox on the Advanced tab of the Base Object properties.

State Management

211

Configuring State Management for Base Objects

Enable State Management Check box

212 Siperian Hub Administrator Guide

Configuring State Management for Base Objects

Enabling the History of Cross-Reference Promotion
When the History of Cross-Reference Promotion option is enabled, the Hub creates and stores history information in the _HXPR table for a base object each time an XREF belonging to a record in this base object undergoes a state transition from PENDING (0) to ACTIVE (1). To enable the history of cross-reference promotion for a base object: 1. Open the Model workbench and click on the Schema tool.
2. 3.

In the Schema tool, select the desired base object. Click the Enable State Management checkbox on the Advanced tab of the Base Object properties. Click the History of Cross-Reference Promotion checkbox on the Advanced tab of Base Object properties.

4.

History of Cross-Reference Promotion Check box

State Management 213

Configuring State Management for Base Objects

Enabling Match on Pending Records
By default, the match process includes only active records and ignores pending records. For state management-enabled objects, to include pending records in the match process, match pending records must be explicitly enabled. To enable match on pending records for a base object: 1. Open the Model workbench and click on the Schema tool.
2. 3.

In the Schema tool, select the desired base object. Click the Enable State Management checkbox on the Advanced tab of the Base Object properties. Select Match/Merge Setup for the base object. Click the Enable Match on Pending Records checkbox on the Properties tab of Match/Merge Setup Details panel.

4. 5.

Enable Match on Pending Records Check box

214 Siperian Hub Administrator Guide

Configuring State Management for Base Objects

Enabling Message Queue Triggers for State Changes
Siperian Hub uses message triggers to identify which actions are communicated to outside applications using messages in message queues. When an action occurs for which a rule is defined, a message is placed in the message queue. A message trigger specifies the queue in which messages are placed. Siperian Hub enables you to trigger message events for base object record when a pending update occurs. The following message triggers are available for state changes to base object or XREF records:
Event Trigger Add new pending data Update existing pending data Pending update; only XREF changed Delete base object data Delete XREF data Delete pending base object data Delete pending XREF data Action A new pending record is created. A pending base object record is updated. A pending XREF record is updated. This event includes the promotion of a record. A base object record is soft deleted. An XREF record is soft deleted. A base object record is hard deleted. An XREF record is hard deleted.

To enable the message queue triggers on a pending update for a base object: 1. Open the Model workbench and click on Schema.
2.

In the Schema tool, click the Trigger on Pending Updates checkbox for message queues in the Message Queues tool.

To learn more about message queues and message triggers, including how to enable message queue triggers for state changes to base object and XREF records, see “Configuring Message Triggers” on page 612.

State Management 215

Modifying the State of Records

Modifying the State of Records
Promotion of a record is the process of changing the system state of individual records in Siperian Hub from PENDING state to the ACTIVE state. You can set a record for promotion immediately using the Data Steward tools, or you can flag records to be promoted at a later time using the Promote batch process.

Promoting Records in the Data Steward Tools
You can immediately promote PENDING base object or XREF records to ACTIVE state using the tools in the Data Steward workbench (that is, the Data Manager or Merge Manager). You can also flag these records for promotion at a later time using either tool. To learn more about using the Hub Console to perform these tasks, see the Siperian Hub Data Steward Guide.

Flagging Base Object or XREF Records for Promotion at a Later Time
To flag base object or XREF records for promotion at a later time using the Data Manager: 1. Open the Data Steward workbench and click on the Data Manager tool.
2. 3.

In the Data Manager tool, click on the desired base object or XREF record. Click on the Flag for Promote button on the associated panel.

216 Siperian Hub Administrator Guide

Modifying the State of Records

Flag for Promote Buttons

Note: If HUB_STATE_IND is set to read-only for a package, the Set Record State button is disabled (greyed-out) in the Data Manager and Merge Manager Hub Console tools for the associated records. However, the Flag for Promote button remains active because it doesn’t directly alter the HUB_STATE_IND column for the record(s).
4.

Run a batch job to promote records that are flagged for promotion. For more information, see “Promoting Records Using the Promote Batch Job”.

Promoting Matched Records Using the Merge Manager
To promote matched records at a later time using the Merge Manager: 1. Open the Data Steward workbench and click on the Merge Manager tool.
2. 3.

In the Merge Manager tool, click on the desired matched record. Click on the Flag for Promote button on the Matched Records panel.

State Management 217

Modifying the State of Records

You can now promote these PENDING XREF records using the Promote batch job.

Promoting Records Using the Promote Batch Job
You can run a batch job to promote records that are flagged for promotion using the Batch Viewer or Batch Group tool.

Setting Up a Promote Batch Job Using the Batch Viewer
To set up a batch job using the Batch Viewer to promote records flagged for promotion: 1. Flag the desired PENDING records for promotion. For more information, see “Modifying the State of Records” on page 216.
2. 3.

Open the Utilities workbench and click on the Batch Viewer tool. Click on the Promote batch job under the Base Object node displayed in the Batch Viewer. Select Promote flagged records abc. Where abc represents the associated records that you have previously flagged for promotion.

Setting Up a Promote Batch Job Using the Batch Group Tool
To add a Promote Batch job using the Batch Group Tool to promote records flagged for promotion: 1. Flag the desired PENDING records for promotion. For more information, see “Modifying the State of Records” on page 216.
2.

Open the Utilities workbench and click on the Batch Group tool.

3. 4.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Right-click the Batch Groups node in the Batch Group tree and choose Add Batch Group from the pop-up menu (or select Add Batch Group from the Batch Group menu). For more information, see “Adding Batch Groups” on page 691. In the batch groups tree, right click on any level, and choose the desired option to add a new level to the batch group. The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog. For more information, see “Adding Levels to a Batch Group” on page 694.

5.

6.

Expand the base object(s) for the job(s) that you want to add.

State Management 219

Modifying the State of Records

7. 8.

Select the Promote flagged records in [XREF table] job. Click OK.

The Batch Group tool adds the selected job(s) to the batch group.

9.

Click the

button to save your changes.

You can now execute the batch group job. For more information, see “Executing a Batch Group” on page 704.

220 Siperian Hub Administrator Guide

Rules for Loading Data

Rules for Loading Data
The load batch process loads records in any state. The state is specified as an input column on the staging table. The input state can be specified in the mapping as a landing table column or it can be derived. If an input state is not specified in the mapping, then the state is assumed to be ACTIVE (for Load inserts). When a record is updated through a Load batch job and the incoming state is null, the existing state of the record to update will remain unchanged. The following table describes how input states affect the states of existing XREF records.
Existing XREF State: Incoming XREF State: ACTIVE PENDING Update Pending Update Soft Delete Treat as ACTIVE Update + Promote Pending Update Hard Delete Treat as PENDING Update + Restore Pending Update + Restore Hard Delete Treat as DELETED Insert Pending Update Error Treat as ACTIVE Insert Pending Insert No XREF (Load by rowid) No Base Object Record

ACTIVE

PENDING

DELETED

DELETED Undefined

Error Treat as ACTIVE

State Management 221

Rules for Loading Data

222 Siperian Hub Administrator Guide

8
Configuring Hierarchies
This chapter explains how to configure Siperian Hierarchy Manager (HM) using the Siperian Hierarchies tool in the Hub Console. The chapter describes how to set up your data and how to configure the components needed by Hierarchy Manager for your Siperian Hub implementation, including entity types, hierarchies, relationships types, packages, and profiles. For instructions on using the Hierarchy Manager, see the Siperian Hub Data Steward Guide. This chapter is recommended for Siperian Hub administrators and implementers.

About Configuring Hierarchies
Siperian Hub administrators use the Hierarchies tool to set up the structures required to view and manipulate data relationships in Hierarchy Manager. Use the Hierarchies tool to define Hierarchy Manager components—such as entity types, hierarchies, relationships types, packages, and profiles—for your Siperian Hub implementation. When you have finished defining Hierarchy Manager components, you can use the package or query manager tools to update the query criteria. To understand the concepts in this chapter, you must be familiar with the concepts in the following chapters in this guide (Siperian Hub Administrator Guide): • Chapter 5, “Building the Schema” • • • Chapter 6, “Configuring Queries and Packages” Chapter 15, “Configuring the Consolidate Process” Chapter 20, “Setting Up Security”

Before You Begin
Before you begin to configure your Hierarchy Manager (HM) system, you must have completed the following tasks: • Start with a blank ORS or a valid ORS and register the database in CMX_SYSTEM, as described in “Registering an ORS” on page 62. • • Verify that you have a license for Hierarchy Manager. For details, consult your Siperian Hub administrator. Perform data analysis, as described in Preparing Your Data for Hierarchy Manager.

224 Siperian Hub Administrator Guide

About Configuring Hierarchies

Overview of Configuration Steps
To configure Hierarchy Manager, complete the following steps: 1. Start the Hub Console, as described in “Starting the Hub Console” on page 19.
2.

Launch the Hierarchies tool, as described in “Starting the Hierarchies Tool” on page 234. If you have not already created the Repository Base Object (RBO) tables, Hub Console walks you through the process, as described in “Creating the HM Repository Base Objects” on page 235.

3.

Create entity objects and types, as described in “Configuring Entity Objects and Entity Types” on page 240. Create hierarchies, as described in “Configuring Hierarchies” on page 253. Create relationship objects and types, as described in “Configuring Relationship Base Objects and Relationship Types” on page 255. Create packages, as described in “Configuring Packages for Use by HM” on page 269. Configure profiles, as described in “Deleting Relationship Types from a Profile” on page 284. Validate the profile, as described in “Validating Profiles” on page 280.

4. 5.

6.

7.

8.

Note: The same options you see on the right-click menu in the Hierarchy Manager are also available on the Hierarchies menu.

Preparing Your Data for Hierarchy Manager
To make the best use of HM, you should analyze your information and make sure you have done the following: • Verified that your data sources are valid and trustworthy. For more information on security issues, see Chapter 20, “Setting Up Security”.

Configuring Hierarchies 225

About Configuring Hierarchies

•

Created valid schema to work with Siperian Hub and the HM. For more information on schemas and how to create them, see Chapter 5, “Building the Schema”.

•

Created all relationships between your entities, including: • Hierarchical relationships: • • • All child entities must have a valid parent entity related to them. Your data cannot have any ‘orphan’ child entities when it enters HM. All hierarchies must be validated (see Chapter 9, “Siperian Hub Processes”).

Foreign key relationships. For a general overview of foreign key relationships, see “Process Overview for Defining Foreign-Key Relationships” on page 143. For more information about parent-child relationships, see “Configuring Match Paths for Related Records” on page 497.

•

One-hop and multi-hop relationships (direct and indirect relationships between entities). For more information on these kinds of relationships, see the Siperian Hub Data Steward Guide.

• •

Derived HM types. Consolidated duplicate entities from multiple source systems. For example, a group of entities (Source A) might be the same as another group of entities (Source B), but the two groups of entities might have different group names. Once the entities are identified as being identical, the two groups can be consolidated. For more information on consolidation, see Chapter 9, “Siperian Hub Processes”.

•

Grouped your entities into logical categories, such as physician’s names into the “Physician” category. For more information on how to group your data, see Chapter 4, “Configuring Operational Record Stores and Datasources”.

For more information on these database concepts, see a database reference text.

Use Case Example of How to Prepare Data for HM
This section contains an example of how to manipulate your data before it enters Siperian Hub and before it is viewed in Hierarchy Manager. Typically, a company’s data would be much larger than the example given here.

Scenario
John has been tasked with manipulating his company’s data so that it can be viewed and used within Hierarchy Manager in the most efficient way. To simplify the example, we are describing a subset of the data that involves product types and products of the company, which sells computer components. The company sells three types of products: mice, trackballs, and keyboards. Each of these product types includes several vendors and different levels of products, such as the Gaming keyboard and the TrackMan trackball.

Methodology
This section describes the method of data simplification. Step 1 - Organizing Data into the Hierarchy In this step you organize the data into the Hierarchy that will then be translated into the HM configuration. John begins by analyzing the product and product group hierarchy. He organizes the products by their product group and product groups by their parent product group.

Configuring Hierarchies 227

About Configuring Hierarchies

The sheer volume of data and the relationships contained within the data are difficult to visualize, so John lists the categories and sees if there are relationships between them. The following table (which contains data from the Marketing department) shows an example of how John might organize his data.

Note: Most data sets will have many more items. The table shows the data that will be stored in the Products BO. This is the BO to convert (or create) in HM. The table shows Entities, such as Mice or Laser Mouse. The relationships are shown by the grouping, that is, there is a relationship between Mice and Laser Mouse. The heading values are the Entity Types: Mice is a Product Group and Laser Mouse is a Product. This Type is stored in a field on the Product table. Organizing the data in this manner allows John to clearly see how many entities and entity types are part of the data, and what relationships those entities have. The major category is ProdGroup, which can include both a product group (such as mice and pointers), the category Product, and the products themselves (such as the Trackman Wheel). The relationships between these items can be encapsulated in a relationship object, which John calls Product Rel. In the information for the Product Rel, John has explained the relationships: Product Group is the parent of both Product and Product Group. Step 2 - Creating Relationship Base Object Tables Having analyzed the data, John comes to the following conclusions: • Product (the BO) should be converted to an Entity Object.

228 Siperian Hub Administrator Guide

About Configuring Hierarchies

• • •

Product Group and Product are the Entity Types. Product Rel is the Relationship Object to be created. The following relationship types (not all shown in the table) need to be created: • • • Product is the parent of Product (not shown) Product Group is the parent of Product (such as with the Mice to Laser Mouse example). Product Group is the parent of Product Group, such as with Mice + Pointers being the parent of Mice).

John begins by accessing the Hierarchy Tool. When he accesses the tool, the system creates the Relationship Base Object Tables (RBO tables). RBO tables are essentially system base objects that are required base objects containing specific columns. They store the HM configuration data, such as the data that you see in the table in Step 1. The Siperian Hub Administrator Guide explains how to create base objects in detail. This section describes the choices you would make when you create the example base objects in the Schema tool. You must create and configure a base object for each entity object and relationship object that you identified in the previous step. In the example, you would create a base object for Product and convert it to an HM Entity Object. The Product Rel BO should be created in HM directly (an easier process) instead of converting. Each new base object is displayed in the Schema panel under the category Base Objects. Repeat this process to create all your base objects. In the next section, you configure the base objects so that they are optimized for HM use. Step 3 - Configuring Base Objects You created the two base objects (Product and Product Rel) in the previous section. This section describes how to configure them. Configuring a base object involves filling in the criteria for the object’s properties, such as the number and type of columns, the content of the staging tables, the name of the

Configuring Hierarchies 229

About Configuring Hierarchies

cross-reference tables (if any), and so on. You might also enable the history function, set up validation rules and message triggers, create a custom index, and configure the external match table (if any). Whether or not you choose these options and how you configure them depends on your data model and base object choices. In the example, John configures his base objects as the following sections explain. Note: Not all components of the base-object creation are addressed here, only the ones that have specific significance for data that will be used in the HM. For more information on the components not discussed here, see the Schema chapter in this Guide. Columns

This table shows the Product BO after conversion to an HM entity object. In this list, only the Product Type field is an HM field. Every base object has system columns and user-defined columns. System columns are created automatically, and include the required column: Rowid Object. This is the Primary key for each base object table and contains a unique, Hub-generated value. This value cannot be null because it is the HM lookup for the class code. HM makes a foreign key constraint in the database so a ROWID_OBJECT value is required and cannot be null.

230 Siperian Hub Administrator Guide

About Configuring Hierarchies

For the user-defined columns, John choose logical names that would effectively include information about the products, such as Product Number, Product Type, and Product Description. These same column and column values must appear in the staging tables. Staging Tables

John makes sure that all the user-defined columns from the staging tables are added as columns in the base object, as the graphic above shows. The Lookup column shows the HM-added lookup value. Notice that several columns in the Staging Table (Status Cd, Product Type, and Product Type Cd) have references to lookup tables. You can set these references up when you create the Staging Table. You would use lookups if you do not want to hardcode a value in your staging table, but would rather have the server look up a value in the parent table. Most of the lookups are unrelated to HM and are part of the data model. The Rbo Bo Class lookup is the exception because it was added by HM. HM adds the lookup on the product Type column. Note: When you are converting entities to entity base objects (entities that are configured to be used in HM), you must have lookup tables to check the values for the Status Cd, Product Type, and Product Type Cd. Warning: HM Entity objects do not require start and end dates. Any start and end dates would be user defined. However, Rel Objects do use these. Do not create new Rel Objects with different names for start and end dates. These are already provided.

Each entity type has a code that derives from the data analysis and the design. In this example, John chose to use Product as one type, and Product Group as another. This code must be referenced in the corresponding RBO base object table. In this example, the code Product is referenced in the C_RBO_BO_CLASS table. The value of the BO_CLASS_CODE is ‘Product’.

232 Siperian Hub Administrator Guide

About Configuring Hierarchies

The following graphic shows the relationship between the HM entity objects and HM relationship objects to the RBO tables:

When John has completed all the steps in this section, he will be ready to create other HM components, such as packages, and to view his data in the HM. For example, the following graphic shows the relationships that John has set up in the Hierarchies Tool,

Configuring Hierarchies 233

Starting the Hierarchies Tool

displayed in the Hierarchy Manager. This example shows the hierarchy involving Mice devices fully. For more information on how to use HM, see the Data Steward Guide.

Starting the Hierarchies Tool
To start the Hierarchies tool: • In the Hub Console, do one of the following: • • Expand the Model workbench, and then click Hierarchies. OR In the Hub Console toolbar, click the Hierarchies tool quick launch button.

234 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

The Hub Console displays the Hierarchies tool, as shown in the following example:

Properties Pane

Navigation Pane

If you are setting up the Hierarchies tool, see “Creating the HM Repository Base Objects” on page 235. If you already have RBO tables set up, see “Configuring Entity Icons” on page 238.

Creating the HM Repository Base Objects
To use the Hierarchies tool with an ORS, the system must first create the Repository Base Objects (RBO tables) for the ORS. RBO tables are essentially system base objects. They are required base objects that must contain specific columns. Queries and MRM packages (and their associated queries) will also be created for these RBO tables. Warning: Never modify these RBO tables, queries, or packages.

Configuring Hierarchies 235

Starting the Hierarchies Tool

To create the RBOs: 1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
2.

Start the Hierarchies tool. Expand the Model workbench and click Hierarchies. To learn more, see “Starting the Hierarchies Tool” on page 234.

Note: Any option that you can select by right-clicking in the navigation panel, you can also choose from the Hierarchies tool menu. After you start the Hierarchies tool, if an ORS does not have the necessary RBO tables, then the Hierarchies tool walks you through the process of creating them. The following steps explain what to select in the dialog boxes that the Hierarchies tool displays: 1. Choose Yes in the Siperian Hub Console dialog to create the metadata (RBO tables) for HM in the ORS.
2.

Select the tablespace names in the Create RBO tables dialog, and then click OK.

236 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

Uploading Default Entity Icons
The Hierarchies tool prompts you to upload default entity icons. These icons will be useful to you when you are creating entities.
1. 2.

Click Yes. The Hub Console displays the Hierarchies tool with the default metadata, as shown in the following example.

Upgrading From Previous Versions of Hierarchy Manager
After you upgrade a pre-XU schema to XU, you will be prompted to upgrade the XU-specific Hierarchy Manager (HM) metadata when you open the Hierarchies tool in the Hub Console. To upgrade the HM metadata: 1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
2. 3. 4.

Start the Hub Console. To learn more, see “Starting the Hub Console” on page 19. Launch the Hierarchies tool in the Hub Console. Click Yes to add additional columns.

After you upgrade a pre-XU schema to XU, you will be reminded to remove obsolete HM metadata when you get into the Hierarchies tool.

Start the Hub Console. To learn more see the “Starting the Hub Console” on page 19. Launch the Hierarchies tool in the Hub Console. Click Yes to delete a base object.

3. 4.

Note: If the Rbo Rel Type Usage base object is being used by some other non-HM base object, you will be told to manually delete the table by going to the schema manager. Siperian Hub XU shows relationship and entity types under the base object with which they are associated. If a type is not associated with a base object, for example it does not have packages assigned, it is not displayed in the GUI, but does remain in the database. During the ORS upgrade process, the migration script skips over the orphan entity and relationship types, displays a related warning message, then continues. After the ORS upgrade, you can delete the orphan types or associate entities and relationship types with them. If you want to associate orphan types but you have not created the corresponding base objects, create the objects, then press refresh. The software prompts you to create the association.

Configuring Entity Icons
Using the Hierarchies tool, you can add or configure your own entity icons that you can subsequently use when configuring your entity types. These entity icons are used to represent your entities in graphic displays within Hierarchy Manager. Entity icons must be stored in a JAR or ZIP file.

238 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

Adding Entity Icons
To import your own icons, create a ZIP or JAR file containing your icons. For each icon, create a 16 x 16 icon for the small icon and a 48 x 48 icon for the large icon. To add new entity icons: 1. Acquire a write lock.
2. 3.

Start the Hierarchies tool. Right-click anywhere in the navigation pane and choose Add Entity Icons. Note: You must acquire a lock to display the right-click menu. A browse files window opens. Browse for the JAR or ZIP file containing your icons. Click Open to add the icons.

4. 5.

Modifying Entity Icons
You cannot modify icons directly from the console. You can download a ZIP or JAR file, modify its contents, then upload it again into the console. You can either delete icons groups or make them inactive. If an icon is already associated with an entity, or if you could use a group of icons in the future, you might consider choosing to inactivate them instead of deleting them. You inactivate a group of icons by marking the icon package Inactive. Inactive icons are not displayed in the UI and cannot be assigned to an entity type. To reactivate the icon packet, mark it Active. Warning: Siperian Hub does not validate icons assignments before deleting. If you delete an icon that is currently assigned to an Entity Type, you will get an error when you try to save the edit.

Configuring Hierarchies 239

Starting the Hierarchies Tool

Deleting Entity Icons
You cannot delete individual icons from a ZIP or JAR file from the console; you can only delete them as a group or package. To delete a group of entity icons: 1. Acquire a write lock.
2.

Start the Hierarchies tool. To learn more, see “Starting the Hierarchies Tool” on page 234. Right-click the icon collections in the navigation pane and choose Delete Entity Icons.

3.

Configuring Entity Objects and Entity Types
This section describes how to define entity objects and entity types using the Hierarchies tool.

About Entities, Entity Objects, and Entity Types
This section describes entities, entity objects, and entity types in Hierarchy Manager. Entities In Hierarchy Manager, an entity is any object, person, place, organization, or other thing that has a business meaning and can be acted upon in your database. Examples include a specific person’s name, a specific checking account number, a specific company, a specific address, and so on. Entity Base Objects An entity base object is a base object that has been configured in HM, and that is used to store HM entities. When you create an entity base object using the Hierarchies tool (instead of the Schema Manager), the Hierarchies tool automatically creates the columns required for Hierarchy Manager. You can also convert an existing MRM base object to an entity base object by using the options in the Hierarchies tool.

240 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

After adding an entity base object, you use the Schema Manager to view, edit, or delete it. To learn more, see “Configuring Base Objects” on page 92. Entity Types In Hierarchy Manager, an entity type is a logical classification of one or more entities. Examples include doctors, checking accounts, banks, and so on. An Entity Base Object must have a Foreign Key to the Entity Type table (Rbo BO Class). The foreign key can be defined as either a ROWID or predefined Code value. All entities with the same entity type are stored in the same entity object. In the Hierarchies tool, entity types are displayed in the navigation tree under the Entity Object with which the Type is associated. Well-defined entity types have the following characteristics: • They effectively segment the data in a way that reflects the real-world nature of the entities. • • • They are disjoint. That is, no entity can have more than one entity type. Taken collectively, they cover the entire set of entities. That is, every entity has one and only one entity type. They are granular enough so that you can easily define the types of relationships that each entity type can have. For example, an entity type of “doctor” can have the relationships: “member of ” with a medical group, “staff ” (or “non-staff with admitting privileges”) with a hospital, and so on. A more general entity type, such as “care provider” (which encompasses nurses, nurse practitioners, doctors, and others) is not granular enough. In this case, the types of relationships that such a general entity type will have will depend on something beyond just the entity type. Therefore, you need to need to define more-granular entity types.

Right-click anywhere in the navigation pane and choose Create New Entity/Relationship Object. You can also choose this option from the Hierarchies tool menu. In the Create New Entity/Relationship Base Object, select Create New Entity Base Object and click OK.

3.

4.

Click OK. The Hierarchies tool prompts you to enter information about the new base object.

242 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

5.

Specify the following properties for this new entity type.
Field Item Type Display name Physical name Description Read-only. Already specified. Name of this base object as it will be displayed in the Hub Console. Actual name of the table in the database. Siperian Hub will suggest a physical name for the table based on the display name that you enter. The RowId is generated and assigned by the system, but the BO Class Code is created by the user, making it easier to remember. Data tablespace Index tablespace Description Foreign Key column for Entity Types Name of the data tablespace. To learn more, see the Siperian Hub Installation Guide for your platform. Name of the index tablespace. To learn more, see the Siperian Hub Installation Guide for your platform. Description of this base object. Column used as the Foreign Key for this entity type; can be either ROWID or CODE.

The ability to choose a BO Class CODE column reduces the complexity by allowing you to define the foreign key relationship based on a predefined code, rather than the Siperian generated ROWID.
Display name Physical name Descriptive name of the column of the Entity Type Foreign Key that is displayed in Hierarchy Manager. Actual name of the FK column in the table. Siperian Hub will suggest a physical name for the FK column based on the display name that you enter.

6.

Click OK to save the new base object.

The base object you created has the columns required by Hierarchy Manager. You probably require additional columns in the base object, which you can add using the Schema Manager, as described in “Configuring Columns in Tables” on page 125.

Configuring Hierarchies 243

Starting the Hierarchies Tool

Important: When you modify the base object using the Schema Manager, do not change any of the columns added by Hierarchy Manager. Modifying any of these Hierarchy Manager columns will result in unpredictable behavior and possible data loss.

Converting Base Objects to Entity Base Objects
You must convert base objects to entity base objects before you can use them in HM. Base objects created in MRM do not have the metadata required by Hierarchy Manager. In order to use these MRM base objects with HM, you must add this metadata via a conversion process. Once you have done this, you can use these converted base objects with both MRM and HM. To convert an existing MRM base object to work with HM: 1. In the Hierarchies tool, acquire a write lock.
2.

Right-click anywhere in the navigation pane and choose Convert BO to Entity/Relationship Object. Note: The same options you see on the right-click menu are also available on the Hierarchies menu.

Note: If you do not see any choices in the Modify Base Object field, then there are no non-hierarchy base objects available. You must create one in the Schema tool.
4.

Click OK.

244 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

If the base object already has HM metadata, the Hierarchies tool will display a message indicating the HM metadata that exists.

5.

In the Foreign Key Column for Entity Types field, select the column to be added: RowId Object or BO Class Code. This is the descriptive name of the column of the Entity Type Foreign Key that is displayed in Hierarchy Manager. The ability to choose a BO Class Code column reduces the complexity by allowing you to define the foreign key relationship based on a predefined code, rather than the Siperian generated ROWID.

6.

In the Existing BO Column to use, select an existing column or select the Create New Column option. If no BO columns exist, only the Create New Column option is available. In the Display Name and Physical Name fields, create display and physical names for the column, and click OK.

7.

Configuring Hierarchies 245

Starting the Hierarchies Tool

The base object will now have the columns that Hierarchy Manager requires. To add additional columns, use the Schema Manager (see “Configuring Columns in Tables” on page 125). Important: When you modify the base object using the Schema Manager tool, do not change any of the columns added using the Hierarchies tool. Modifying any of these columns will result in unpredictable behavior and possible data loss.

Adding Entity Types
To add a new entity type: 1. In the Hierarchies tool, right-click on the entity object in which you want to store the new entity type you are creating and select Add Entity Type.

246 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

The Hierarchies tool displays a new entity type (called New Entity Type) in the navigation tree under the Entity Object you selected.

2.

In the properties panel, specify the following properties for this new entity base object.
Field Code Display name Description Color Description Unique code name of the Entity Type. Can be used as a foreign key from HM entity base objects. Name of this entity type as it will be displayed in the Hub Console. Specify a unique, descriptive name. Description of this entity type. Color of the entities associated with this entity type as they will be displayed in the Hub Console in the Hierarchy Manager Console and Business Data Director. Small icon for entities associated with this entity type as they will be displayed in the Hub Console in the Hierarchy Manager Console and Business Data Director. Large icon for entities associated with this entity type as they will be displayed in the Hub Console in the Hierarchy Manager Console and Business Data Director.

Small Icon

Large Icon

3.

To designate a color for this entity type, click

next to Color.

Configuring Hierarchies 247

Starting the Hierarchies Tool

The color choices window is displayed.

The color you choose determines how entities of this type are displayed in the Hierarchy Manager. Select a color and click OK.
4.

To select a small icon for the new entity type, click The Choose Small Icon window is displayed.

next to Small Icon.

Small icons determine how entities of this type are displayed when the graphic view window shows many entities. To learn more about adding icon graphics for your entity types, see “Configuring Entity Icons” on page 238. Select a small icon and click OK.
5.

To select a large icon for the new entity type, click

next to Large Icon.

248 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

The Choose Large Icon window is displayed.

Large icons determine how entities of this type are displayed when the graphic view window shows few entities. To learn more about adding icon graphics for your entity types, see “Configuring Entity Icons” on page 238. Select a large icon and click OK.
6.

Click

to save the new entity type.

Editing Entity Types
To edit an entity type: 1. In the Hierarchies tool, in the navigation tree, click the entity type to edit.
2.

For each field that you want to edit, click and make the change that you want. For more information about these fields, see “Adding Entity Types” on page 246. When you have finished making changes, click to save your changes.

3.

Warning: If your entity object uses the code column, you probably do not want to modify the entity type code if you already have records for that entity type.

Configuring Hierarchies 249

Starting the Hierarchies Tool

Deleting Entity Types
You can delete any entity type that is not used by any relationship types. If the entity type is being used by one or more relationship types, attempting to delete it will generate an error. To delete an entity type: 1. Acquire a write lock.
2.

In the Hierarchies tool, in the navigation tree, right-click the entity type that you want to delete, and choose Delete Entity Type. If the entity type is not used by any relationship types, then the Hierarchies tool prompts you to confirm deletion.

3.

Choose Yes. The Hierarchies tool removes the selected entity type from the list.

Warning: You probably do not want to delete an entity type if you already have entity records that use that type. If your entity object uses the code column instead of the rowid column and you have records in that entity object for the entity type you are trying to delete, you will get an error.

Display Options for Entities
In addition to configuring color and icons for entities, you can also configure the font size and maximum width. While color and icons can be specified for each entity type, the font size and width apply to entities of all types. To change the font size in HM, use the HM Font Size and Entity Box Size. The default entity font size (38 pts) and max entity box width (600 pixels) can be overridden by settings in the cmxserver.properties file. The settings to use are:
sip.hm.entity.font.size=fontSize sip.hm.entity.max.width=maxWidth

The value for fontSize can be from 6 to 100 and the value for maxWidth can be from 20 to 5000. If value specified is outside the range, the minimum or maximum values are used. Default values are used if the values specified are not numbers.

250 Siperian Hub Administrator Guide

Starting the Hierarchies Tool

Reverting Entity Base Objects to Base Objects
If you inadvertently converted a base object to an entity object, or if you no longer want to work with an entity object in Hierarchy Manager, you can revert the entity object to a base object. In doing so, you are removing the HM metadata from the object. To revert an entity base object to a base object:
1. 2.

In the Hierarchies tool, acquire a write lock. Right-click on an entity base object and choose Revert Entity/Relationship Object to BO. If the following Siperian Hub Console dialog box is displayed, click OK:

3.

Note that when you revert the entity object, you are also reverting its corresponding relationship objects.

Configuring Hierarchies 251

Starting the Hierarchies Tool

4.

In the Revert Entity/Relationship Object dialog box, click OK.

5.

A dialog is displayed when the entity is reverted.

252 Siperian Hub Administrator Guide

Configuring Hierarchies

Configuring Hierarchies
This section describes how to define hierarchies using the Hierarchies tool.

About Hierarchies
A hierarchy is a set of relationship types (as described in “About Relationships, Relationship Objects, and Relationship Types” on page 255). These relationship types are not ranked, nor are they necessarily related to each other. They are merely relationship types that are grouped together for ease of classification and identification. The same relationship type can be associated with multiple hierarchies. A hierarchy type is a logical classification of hierarchies.

Adding Hierarchies
To add a new hierarchy: 1. In the Hierarchies tool, acquire a write lock.
2.

Right-click an entity object in the navigation pane and choose Add Hierarchy. The Hierarchies tool displays a new hierarchy (called New Hierarchy) in the navigation tree under the Hierarchies node. The default properties are displayed in the properties pane.

Configuring Hierarchies 253

Configuring Hierarchies

3.

Specify the following properties for this new hierarchy.
Field Code Display name Description Description Unique code name of the hierarchy. Can be used as a foreign key from HM relationship base objects. Name of this hierarchy as it will be displayed in the Hub Console. Specify a unique, descriptive name. Description of this hierarchy.

In the navigation tree, click the hierarchy to edit. Click Click and edit the name. to save your changes.

Warning: If your relationship object uses the hierarchy code column (instead of the rowid column), you probably do not want to modify the hierarchy code if you already have records for that hierarchy in the relationship object.

Deleting Hierarchies
Warning: You do not want to delete a hierarchy if you already have relationship records that use the hierarchy. If your relationship object uses the hierarchy code column instead of the rowid column and you have records in that relationship object for the hierarchy you are trying to delete, you will get an error.

254 Siperian Hub Administrator Guide

Configuring Relationship Base Objects and Relationship Types

To delete a hierarchy: 1. In the Hierarchies tool, acquire a write lock.
2.

In the navigation tree, right-click the hierarchy that you want to delete, and choose Delete Hierarchy. The Hierarchies tool prompts you to confirm deletion. Choose Yes. The Hierarchies tool removes the selected hierarchy from the list.

3.

Note: You are allowed to delete a hierarchy that has relationship types associated with it. There will be a warning with the list of associated relationship types. If you elect to delete the hierarchy, all references to it will automatically be removed.

Configuring Relationship Base Objects and Relationship Types
This section describes how to define relationship types using the Hierarchies tool.

Relationships
A relationship describes the affiliation between two specific entities. Hierarchy Manager relationships are defined by specifying the relationship type, hierarchy type, attributes of the relationship, and dates for when the relationship is active.

Relationship Base Objects
A relationship base object is a base object used to store HM relationships.

Configuring Hierarchies 255

Configuring Relationship Base Objects and Relationship Types

Relationship Types
A relationship type describes classes of relationships and defines the types of entities that a relationship of this type can include, the direction of the relationship (if any), and how the relationship is displayed in the Hub Console. Note: Relationship Type is a physical construct and can be configuration heavy, while Hierarchy Type is more of a logical construct and is typically configuration light. Therefore, it is often easier to have many Hierarchy Types than to have many Relationship Types. Be sure to understand your data and hierarchy management requirements prior to defining Hierarchy Types and Relationship Types within Siperian. A well defined set of Hierarchy Manager relationship types has the following characteristics: • It reflects the real-world relationships between your entity types. • It supports multiple relationship types for each relationship.

Creating Relationship Base Objects
A relationship base object is used to store HM relationships. To add a new relationship base object: 1. In the Hierarchies tool, acquire a write lock.
2.

Right-click anywhere in the navigation pane and choose Create New Entity/Relationship Object...

256 Siperian Hub Administrator Guide

Configuring Relationship Base Objects and Relationship Types

The Hierarchies tool prompts you to select the type of base object to create.

3. 4.

Select Create New Relationship Base Object. Click OK. The Hierarchies tool prompts you to enter information about the new relationship base object.

Configuring Hierarchies 257

Configuring Relationship Base Objects and Relationship Types

5.

Specify the following properties for this new entity base object.
Field Item Type Display name Physical name Description Read-only. Already specified. Name of this base object as it will be displayed in the Hub Console. Actual name of the table in the database. Siperian Hub will suggest a physical name for the table based on the display name that you enter. Name of the data tablespace. To learn more, see the Siperian Hub Installation Guide for your platform. Name of the index tablespace. To learn more, see the Siperian Hub Installation Guide for your platform. Description of this base object. Entity base object to be linked via this relationship base object. Name of the column that is a FK to the entity base object 1. Actual name of the column in the database. Siperian Hub will suggest a physical name for the column based on the display name that you enter. Entity base object to be linked via this relationship base object. Name of the column that is a FK to the entity base object 2. Actual name of the column in the database. Siperian Hub will suggest a physical name for the column based on the display name that you enter.

Hierarchy FK Column Column used as the foreign key for the hierarchy; can be either ROWID or CODE. The ability to choose a BO Class CODE column reduces the complexity by allowing you to define the foreign key relationship based on a predefined code, rather than the Siperian generated ROWID. Hierarchy FK Display Name Name of this FK column as it will be displayed in the Hub Console

Hierarchy FK Physical Actual name of the hierarchy foreign key column in the table. Name Siperian Hub will suggest a physical name for the column based on the display name that you enter.

258 Siperian Hub Administrator Guide

Configuring Relationship Base Objects and Relationship Types

Field Rel Type FK Column Rel Type Display Name Rel Type Physical Name

Description Column used as the foreign key for the relationship; can be either ROWID or CODE. Name of the column that is used to store the Rel Type CODE or ROWID. Actual name of the relationship type FK column in the table. Siperian Hub will suggest a physical name for the column based on the display name that you enter.

6.

Click OK to save the new base object.

The relationship base object you created has the columns required by Hierarchy Manager. You may require additional columns in the base object, which you can add using the Schema Manager, as described in “Configuring Columns in Tables” on page 125. Important: When you modify the base object using the Schema Manager, do not change any of the columns added by Hierarchy Manager. Modifying any of these columns will result in unpredictable behavior and possible data loss.

Specify the base object and the number of Foreign Key columns, then click OK. The Hierarchies tool displays the Convert to FK Relationship Base Object dialog.

4.

Specify the following properties for this new FK relationship object.
Field FK Constraint Entity BO 1 Existing BO Column to Use FK Column Display Name 1 Description Select FK entity base object from list. Name of existing base object column used for FK, or choose to create a new column. Name of FK column as it will be displayed in the Hub Console.

260 Siperian Hub Administrator Guide

Configuring Relationship Base Objects and Relationship Types

Field FK Column Physical Name 1 FK Column Represents
5.

Description Actual name of FK column in the database. Siperian Hub will suggest a physical name for the table based on the display name that you enter. Choose Entity1 or Entity2, depending on what the FK Column represents in the relationship.

Click OK to save the new FK relationship object.

The base object you created has the columns required by Hierarchy Manager. You may require additional columns in the base object, which you can add using the Schema Manager, as described in “Configuring Columns in Tables” on page 125. Important: When you modify the base object using the Schema Manager tool, do not change any of the columns added by the Hierarchies tool. Modifying any of these columns will result in unpredictable behavior and possible data loss. For more information about foreign key relationships, see Chapter 5, “Building the Schema.”

Converting Base Objects to Relationship Base Objects
Relationship base objects are tables that contain information about two entity base objects. Base objects created in MRM do not have the metadata required by Hierarchy Manager for relationship information. In order to use these MRM base objects with Hierarchy Manager, you must add this metadata via a conversion process. Once you have done this, you can use these converted base objects with both MRM and HM.

Configuring Hierarchies 261

Configuring Relationship Base Objects and Relationship Types

To convert a base object to a relationship object for use with HM: 1. In the Hierarchies tool, acquire a write lock.
2.

Right-click in the navigation pane and choose Convert BO to Entity/Relationship Object.

Specify the following properties for this base object.
Field Entity Base Object 1 Display name Physical name Description Entity base object to be linked via this relationship base object. Name of the column that is a FK to the entity base object 1. Actual name of the column in the database. Siperian Hub will suggest a physical name for the column based on the display name that you enter. Entity base object to be linked via this relationship base object. Name of the column that is a FK to the entity base object 2. Actual name of the column in the database. Siperian Hub will suggest a physical name for the column based on the display name that you enter.

Entity Base Object 2 Display name Physical name

Hierarchy FK Column Column used as the foreign key for the hierarchy; can be either ROWID or CODE. The ability to choose a BO Class CODE column reduces the complexity by allowing you to define the foreign key relationship based on a predefined code, rather than the Siperian generated ROWID. Existing BO Column to Use Hierarchy FK Display Name Actual column in the existing BO to use. Name of this FK column as it will be displayed in the Hub Console

Hierarchy FK Physical Actual name of the hierarchy foreign key column in the table. Name Siperian Hub will suggest a physical name for the column based on the display name that you enter. Rel Type FK Column Existing BO Column to Use Rel Type FK Display Name Rel Type FK Physical Name Column used as the foreign key for the relationship; can be either ROWID or CODE. Actual column in the existing BO to use. Name of the FK column that is used to store the Rel Type CODE or ROWID. Actual name of the relationship type FK column in the table. Siperian Hub will suggest a physical name for the column based on the display name that you enter.

Configuring Hierarchies 263

Configuring Relationship Base Objects and Relationship Types

5.

Click OK.

Warning: When you modify the base object using the Schema Manager tool, do not change any of the columns added by HM. Modifying any of these HM columns will result in unpredictable behavior and possible data loss.

Reverting Relationship Base Objects to Base Objects
This removes HM metadata from the relationship object. The relationship object remains as a base object, but is no longer displayed in the Hierarchy Manager. To revert a relationship object to a base object: 1. In the Hierarchies tool, acquire a write lock.
2.

Configuring Relationship Types
This section describes how to configure relationship types in the Hierarchies tool.

Adding Relationship Types
To add a new relationship type: 1. In the Hierarchies tool, acquire a write lock.
2.

Right-click on a relationship object and choose Add Relationship Type. The Hierarchies tool displays a new relationship type (called New Rel Type) in the navigation tree under the Relationship Types node. The default properties are displayed in the properties pane.

Note: You can only save a relationship type if you associate it with a hierarchy. An Foreign Key Relationship Base Object is an Entity Base Object containing a foreign key to another Entity Base Object. A Relationship Base Object is a table that relates the two Entity Base Objects. Note: FK relationship types can only be associated with a single hierarchy.

Configuring Hierarchies 265

Configuring Relationship Base Objects and Relationship Types

3.

The properties panel displays the properties you must enter to create the relationship.

4.

In the properties panel, specify the following properties for this new relationship type.
Field Code Display name Description Color Description Unique code name of the rel type. Can be used as a foreign key from HM relationship base objects. Name of this relationship type as it will be displayed in the Hub Console. Specify a unique, descriptive name. Description of this relationship type. Color of the relationships associated with this relationship type as they will be displayed in the Hub Console in the Hierarchy Manager Console and Business Data Director. First entity type associated with this new relationship type. Any entities of this type will be able to have relationships of this relationship type. Second entity type associated with this new relationship type. Any entities of this type will be able to have relationships of this relationship type.

Entity Type 1

Entity Type 2

266 Siperian Hub Administrator Guide

Configuring Relationship Base Objects and Relationship Types

Field Direction

Description Select a direction for the new relationship type to allow a directed hierarchy. The possible directions are: • Entity 1 to Entity 2 • Entity 2 to Entity 1 • Undirected • Bi-Directional • Unknown An example of a directed hierarchy is an organizational chart, with the relationship reports to being directed from employee to supervisor, and so on, up to the head of the organization.

FK Rel Start Date FK Rel End Date Hierarchies

The start date of the foreign key relationship. The end date of the foreign key relationship. Check the check box next to any hierarchy that you want associated with this new relationship type. Any selected hierarchies can contain relationships of this relationship type.

5.

Click

next to Color to designate a color for this entity type.

The color choices window is displayed.

Configuring Hierarchies 267

Configuring Relationship Base Objects and Relationship Types

The color you choose determines how entities of this type are displayed in the Hierarchy Manager. Select a color and click OK.
6.

Click the Calendar button to designate a start and end date for a foreign key relationship. All relationships of this FK relationship type will have the same start and end date. If you do not specify these dates, the default values are automatically added. Select a hierarchy. Click to save the new relationship type.

In the navigation tree, click the relationship type that you want to edit. For each field that you want to edit, click and make the change that you want. To learn more about these fields, see “Adding Relationship Types” on page 265. When you have finished making changes, click to save your changes.

4.

Warning: If your relationship object uses the code column, you probably do not want to modify the relationship type code if you already have records for that relationship type. This warning does not apply to FK relationship types.

Deleting Relationship Types
Warning: You probably do not want to delete a relationship type if you already have relationship records that use the relationship type. If your relationship object uses the relationship type code column instead of the rowid column and you have records in that relationship object for the relationship type you are trying to delete, you will get an error.

268 Siperian Hub Administrator Guide

Configuring Packages for Use by HM

The above warnings are not applicable to FK relationship types.You can delete relationship types that are associated with hierarchies. The confirmation dialog displays the hierarchies associated with the relationship type being deleted. To delete a relationship type: 1. In the Hierarchies tool, acquire a write lock.
2.

In the navigation tree, right-click the relationship type that you want to delete, and choose Delete Relationship Type. The Hierarchies tool prompts you to confirm deletion. Choose Yes. The Hierarchies tool removes the selected relationship type from the list.

3.

Configuring Packages for Use by HM
This section describes how to add MRM packages to your schema using the Hierarchies tool. You can create MRM packages for entity base objects, relationship base objects, and foreign key relationship base objects. If records will be inserted or changed in the package, be sure to enable the Put option.

About Packages
As described in Chapter 6, “Configuring Queries and Packages,” package is a public view of one or more underlying tables in Siperian Hub. Packages represent subsets of the columns in those tables, along with any other tables that are joined to the tables. A package is based on a query. The underlying query can select a subset of records from the table or from another package. Packages are used for configuring user views of the underlying data. For more information, see “Configuring Queries and Packages” on page 161. You must first create a package to use with Hierarchy Manager, then you must associate it with Entity Types or Relationship Types.

Configuring Hierarchies 269

Configuring Packages for Use by HM

Creating Packages
This section describes how to create HM and Relationship packages.

In the Hierarchies tool, right-click anywhere in the navigation pane and choose Create New Package. The Hierarchies tool starts the Create New Package wizard and displays the first dialog box.

270 Siperian Hub Administrator Guide

Configuring Packages for Use by HM

3.

Specify the following information for this new package.
Field Type of Package Description One of the following types: • • • Query Group Entity Object Relationship Object FK Relationship Object

Select an existing query group or choose to create a new one. In Siperian Hub, query groups are logical groups of queries. For more information, see “Configuring Query Groups” on page 164. Name of the new query group - only needed if you chose to create a new group above. Optional description for the new query group you are creating.

Query group name Description
4.

Click Next. The Create New Package wizard displays the next dialog box.

Configuring Hierarchies 271

Configuring Packages for Use by HM

5.

Specify the following information for this new package.
Field Query Name Description Name of the query. In Siperian Hub, a query is a request to retrieve data from the Hub Store. For more information, see “Configuring Queries” on page 166. Optional description. Primary table for this query.

Description Select Primary Table
6.

Click Next. The Create New Package wizard displays the next dialog box.

272 Siperian Hub Administrator Guide

Configuring Packages for Use by HM

7.

Specify the following information for this new package.
Field Display Name Physical Name Description Enable PUT Description Display name for this package, which will be used to display this package in the Hub Console. Physical name for this package. The Hub Console will suggest a physical name based on the display name you entered. Optional description. Select to enable records to be inserted or changed. (optional) If you do not choose this, your package will be read only. If you are creating a foreign key relationship object package, you have additional steps in Step 9 of this procedure. Note: You must have both a PUT and a non-PUT package for every Foreign Key relationship. Both Put and non-Put packages that you create for the same foreign key relationship object must have the same columns. Secure Resource Select to create a secure resource. (optional)

8.

Click Next. The Create New Package wizard displays a final dialog box. The dialog box you see depends on the type of package you are creating. • If you selected to create either a package for entities or relationships or a PUT package for FK relationships, a dialog box similar to the following dialog box is displayed. The required columns (shown in grey) are automatically selected — you cannot deselect them.

Configuring Hierarchies 273

Configuring Packages for Use by HM

Deselect the columns that are irrelevant to your package.

Note: You must have both a PUT and a non-PUT package for every Foreign Key relationship. Both Put and non-Put packages that you create for the same foreign key relationship object must have the same columns. • If you selected to create a non-Put enabled package for foreign key relationships (see Step 7 of this procedure - do not check the Put check box), the following dialog box is displayed:

274 Siperian Hub Administrator Guide

Configuring Packages for Use by HM

9.

If you are creating a non-Put enabled package for foreign key relationships, specify the following information for this new package.
Field Hierarchy Relationship Type Description Hierarchy associated with this package. For more information, see “Configuring Hierarchies” on page 253. Relationship type associated with this package. For more information, see “Configuring Relationship Base Objects and Relationship Types” on page 255.

Note: You must have both a PUT and a non-PUT package for every Foreign Key relationship. Both Put and non-Put packages that you create for the same foreign key relationship object must have the same columns. 10. Select the columns for this new package.
11.

Click Finish to create the package.

Use the Packages tool to view, edit, or delete this newly-created package, as described in “Configuring Packages” on page 196. You should not remove columns that are needed by Hierarchy Manager. These columns are automatically selected (and greyed out) when the user creates packages using the Hierarchies tool.

After You Create a Package
After creating a package, assign that package to an entity or relationship type.

Assigning Packages to Entity or Relationship Types
After you create a profile, and a package for each of the entity/relationship types in a profile, you must assign the packages. This defines what fields are displayed when an entity is displayed in HM. To learn more, see “Customizing the Hub Console Interface” on page 45. You can also assign a package for relationship types and entity types.

Configuring Hierarchies 275

Configuring Packages for Use by HM

To assign a package to an entity/relationship type: 1. Acquire a write lock.
2.

In the Hierarchies tool, select the Entity/Relationship Type. The Hierarchy Manager displays the properties for the Package for that type if they exist, or the same properties pane with empty fields. When you make the display and Put package selections, the HM package column information is displayed in the lower panel.

The numbers in the cells define the sequence in which the attributes are displayed. 3. Configure the package for your entity or relationship type.

Label

Columns used to display the label of the entity/relationship you are viewing in the HM graphical console. These columns are used to create the Label Pattern in the Hierarchy Manager Console and Business Data Director. To edit a label, click the label value to the right of the label. In the Edit Pattern dialog, enter a new label or double-click a column to use it in a pattern.

Tooltip

Columns used to display the description or comment that appears when you scroll over the entity/relationship. Used to create the tooltip pattern in the Hierarchy Manager Console and Business Data Director. To edit a tooltip, click the tooltip pattern value to the right of the Tooltip Pattern label. In the Edit Pattern dialog, enter a new tooltip pattern or double-click a column to use it in a pattern.

276 Siperian Hub Administrator Guide

Configuring Packages for Use by HM

Label

Columns used to display the label of the entity/relationship you are viewing in the HM graphical console. These columns are used to create the Label Pattern in the Hierarchy Manager Console and Business Data Director. To edit a label, click the label value to the right of the label. In the Edit Pattern dialog, enter a new label or double-click a column to use it in a pattern.

Common

Columns used when entities/relationships of different types are displayed in the same list. The selected columns must be in packages associated with all Entity/Relationship Types in the Profile. Columns that can be used with the search tool Columns to be displayed in a search result Columns used for the detailed view of an entity/relationship displayed at the bottom of the screen Columns that are displayed when you want to edit a record Columns that are displayed when you want to create a new record

Search List Detail Put Add
4.

When you have finished making changes, click

to save your changes.

Configuring Hierarchies 277

Configuring Profiles

Configuring Profiles
This section describes how to configure profiles using the Hierarchies tool.

About Profiles
In Hierarchy Manager, a profile is used to define user access to HM objects—what users can view and what the HM objects look like to those users. A profile determines what fields and records an HM user may display, edit, or add. For example, one profile can allow full read/write access to all entities and relationships, while another profile can be read-only (no add or edit operations allowed). Once you define a profile, you can configure it as a secure resource, as described in “Securing Siperian Hub Resources” on page 841.

Adding Profiles
A new profile (called Default) is created automatically for you before you access the HM. The default profile can be maintained, and you can also add additional profiles. Note: The Business Data Director uses the Default Profile to define how Entity Labels as well as Relationship and Entity Tooltips are displayed. Additional Profiles, as well as the additional information defined within Profiles, is only used within the Hierarchy Manager Console and not the Business Data Director. To add a new profile: 1. Acquire a write lock.
2.

In the Hierarchy tool, right-click anywhere in the navigation pane and choose Add Profiles.

278 Siperian Hub Administrator Guide

Configuring Profiles

The Hierarchies tool displays a new profile (called New Profile) in the navigation tree under the Profiles node. The default properties are displayed in the properties pane.

When you select these relationship types and click Save, the tree below the Profile will be populated with Entity Objects, Entity Types, Rel Objects and Rel Types. When you deselect a Rel type, only the Rel types will be removed from the tree not the Entity Types.
3.

Specify the following information for this new profile.
Field Name Description Relationship Types Description Unique, descriptive name for this profile. Description of this profile. Select one or more relationship types associated with this profile.

4.

Click

to save the new profile.

The Hierarchies tool displays information about the relationship types you selected in the References section of the screen. Entity types are also displayed. This information is derived from the relationship types you selected.

Configuring Hierarchies 279

Configuring Profiles

Editing Profiles
To edit a profile: 1. Acquire a write lock.
2.

In the Hierarchies tool, in the navigation tree, click the profile that you want to edit. Configure the profile as needed (specifying the appropriate profile name, description, and relationship types and assigning packages), according to the instructions in “Adding Profiles” on page 278 and “Configuring Packages for Use by HM” on page 269. When you have finished making changes, click to save your changes.

In the Hierarchies tool, in the navigation pane, select the profile to validate.

3.

In the properties pane, click the Validate tab. Note: Profiles can be successfully validated only after the packages are assigned to Entity Types and Relationship Types.

280 Siperian Hub Administrator Guide

Configuring Profiles

The Hierarchies tool displays the Validate tab.

4.

Select a sandbox to use. For information about creating and configuring sandboxes, see the Siperian Hub Data Steward Guide.

5.

To validate the data, check Validate Data. This may take a long time if you have a lot of records. To start the validation process, click Validate HM Configuration.

6.

Configuring Hierarchies 281

Configuring Profiles

The Hierarchies tool displays a progress window during the validation process. The results of the validation appear in the window below the buttons.

7. 8. 9.

When the validation is finished, click Save. Choose the directory where the validation report will be saved. Click Clear to clear the box containing the description of the validation results.

Copying Profiles
To copy a profile: 1. Acquire a write lock.
2.

In the Hierarchies tool, right-click the profile that you want to copy, and then choose Copy Profile. The Hierarchies tool displays a new profile (called New Profile) in the navigation tree under the Profiles node. This new profile that is an exact copy (with a

282 Siperian Hub Administrator Guide

Configuring Profiles

different name) of the profile that you selected to copy. The default properties are displayed in the properties pane.

3.

Configure the profile as needed (specifying the appropriate profile name, description, relationship types, and assigning packages), according to the instructions in “Adding Profiles” on page 278. Click to save the new profile.

4.

Deleting Profiles
To delete a profile: 1. Acquire a write lock.
2.

In the Hierarchies tool, right-click the profile that you want to delete, and choose Delete Profile. The Hierarchies tool displays a window that warns that packages will be removed when you delete this profile.

In the Hierarchy tool, right-click the relationship type and choose Delete Entity Type/Relationship Type From Profile. If the profile contains relationship types that use the entity/relationship type that you want to delete, you will not be able to delete it unless you delete the relationship type from the profile first.

In the Hierarchy tool, right-click the entity type and choose Delete Entity Type/Relationship Type From Profile. If the profile contains relationship types that use the entity type that you want to delete, you will not be able to delete it unless you delete the relationship type from the profile first.

Assigning Packages to Entity and Relationship Types
After you create a profile, you must: • Assign packages to the entity types and relationship types associated with the profile. To learn more, see “Assigning Packages to Entity or Relationship Types” on page 275. Configure the package as a secure resource. To learn more, see “Securing Siperian Hub Resources” on page 841.

Sandboxes
To learn about sandboxes, see the Hierarchy Manager chapter in the Siperian Hub Data Steward Guide.

9
Siperian Hub Processes
This chapter provides an overview of the processes associated with batch processing in Siperian Hub, including key concepts, tasks, and references to related topics in the Siperian Hub documentation.

Before You Begin
Before you begin, you should be thoroughly familiar with the concepts of reconciliation, distribution, best version of the truth (BVT), and batch processing that are described in Chapter 3, “Key Concepts,” in the Siperian Hub Overview.

287

About Siperian Hub Processes

About Siperian Hub Processes
With batch processing in Siperian Hub, data flows through Siperian Hub in a sequence of individual processes.

Overall Data Flow for Batch Processes
The following figure provides a detailed perspective on the overall flow of data through the Siperian Hub using batch processes, including individual processes, source systems, base objects, and support tables.

Note: The publish process is not shown in this figure because it is not a batch process.

288 Siperian Hub Administrator Guide

About Siperian Hub Processes

Consolidation Status for Base Object Records
This section describes the consolidation status of records in a base object.

Consolidation Indicator
All base objects have a system column named CONSOLIDATION_IND. This consolidation indicator represents the consolidation status of individual records as they progress through various processes in Siperian Hub.

The consolidation indicator is one of the following values:
Indicator Value 1 2 3 State Name Description Indicates the record has been through the match and merge process. Indicates that the record has gone through the match process. Indicates that the record is ready to be put through the match process against the rest of the records in the base object. Indicates that the record has been newly loaded into the base object and has not gone through the match process.

CONSOLIDATED UNMERGED QUEUED_FOR_MATCH
NEWLY_LOADED

4

Siperian Hub Processes 289

About Siperian Hub Processes

Indicator Value 9

State Name ON_HOLD

Description Indicates that the Data Steward has put the record on hold, to deal with later.

How the Consolidation Indicator Changes
Siperian Hub updates the consolidation indicator for base object records in the following sequence. 1. During the load process, when a new or updated record is loaded into a base object, Siperian Hub assigns the record a consolidation indicator of 4, indicating that the record needs to be matched.
2.

Near the start of the match process, when a record is selected as a match candidate, the match process changes its consolidation indicator to 3. Note: Any change to the match or merge configuration settings will trigger a reset match dialog, asking whether you want to reset the records in the base object (change the consolidation indicator to 4, ready for match). For more information, see Chapter 14, “Configuring the Match Process,” and Chapter 15, “Configuring the Consolidate Process.”

3.

Before completing, the match process changes the consolidation indicator of match candidate records to 2 (ready for consolidation). Note: The match process may or may not have found matches for the record. A record with a consolidation indicator of 2 or 4 is visible in Merge Manager. For more information, see the Siperian Hub Data Steward Guide.

4.

If Accept All Unmatched Rows as Unique is enabled, and a record has undergone the match process but no matches were found, then Siperian Hub automatically changes its consolidation indicator to 1 (unique). For more information, see “Accept All Unmatched Rows as Unique” on page 492. If Accept All Unmatched Rows as Unique is enabled, after the record has undergone the consolidate process, and once a record has no more duplicates to merge with, Siperian Hub changes its consolidation indicator to 1, meaning that this record is unique in the base object, and that it represents the master record (best version of the truth) for that entity in the base object.

5.

290 Siperian Hub Administrator Guide

About Siperian Hub Processes

Note: Once a record has its consolidation indicator set to 1, Siperian Hub will never directly match it against any other record. New or updated records (with a consolidation indicator of 4) can be matched against consolidated records.

Survivorship and Order of Precedence
When evaluating cells to merge from two records, Siperian Hub determines which cell data should survive and which one should be discarded. The surviving cell data (or winning cell) is considered to represent the better version of the truth between the two cells. Ultimately, a single, consolidated record contains the best surviving cell data and represents the best version of the truth. Survivorship applies to both trust-enabled columns and columns that are not trust enabled. When comparing cells from two different records, Siperian Hub determines survivorship based on the following factors, in order of precedence: 1. If the two columns are trust-enabled, then the data with the highest trust score wins.
2.

If there are no trust scores, then the data with the more recent LAST_UPDATE_ DATE wins. If trust scores are the same from both systems, then the data with the more recent cross-reference SRC_LUD wins. If the SRC_LUD values are equal, then Siperian Hub compares whether the record is an incoming load update (applies to the load process only). If both records are incoming load updates, then Siperian Hub compares the LAST_UPDATE_DATE values in the associated cross-reference records and the one with the more recent LAST_UPDATE_DATE wins. If the LAST_UPDATE_DATE values are equal, then Siperian Hub compares the ROWID_OBJECT, in numeric descending order. The highest ROWID_OBJECT has the winning values.

3.

4.

5.

6.

Siperian Hub Processes 291

Land Process

Land Process

This section describes concepts and tasks associated with the land process in Siperian Hub.

About the Land Process
Landing data is the initial step for loading data into Siperian Hub.

Source Systems and Landing Tables
Landing data involves the transfer of data from one or more source systems to Siperian Hub landing tables.

•

A source system is an external system that provides data to Siperian Hub. Source systems can be applications, data stores, and other systems that are internal to your organization, or obtained or purchased from external sources. For more information, see “About Source Systems” on page 348. A landing table is a table in the Hub Store that contains the data that is initially loaded from a source system. For more information, see “About Landing Tables” on page 355.

•

292 Siperian Hub Administrator Guide

Land Process

Data Flow of the Land Process
The following figure shows the land process in relation to other Siperian Hub processes.

Land Process is External to Siperian Hub
The land process is external to Siperian Hub and is executed using an external batch process or an external application that directly populates landing tables in the Hub Store. Subsequent processes for managing data are internal to Siperian Hub.

Siperian Hub Processes 293

Land Process

Ways to Populate Landing Tables
Landing tables can be populated in the following ways:
Load Method Description

external batch process An ETL (Extract-Transform-Load) tool or other external process copies data from a source system to Siperian Hub. Batch loads are external to Siperian Hub. Only the results of the batch load are visible to Siperian Hub in the form of populated landing tables. Note: This process is handled by a separate ETL tool of your choice. This ETL tool is not part of the Siperian Hub suite of products. real-time processing External applications can populate landing tables in on-line, real-time mode. Such applications are not part of the Siperian Hub suite of products.

For any given source system, the approach used depends on whether it is the most efficient—or perhaps the only—way to data from a particular source system. In addition, batch processing is often used for the initial data load (the first time that business data is loaded into the Hub Store), as it can be the most efficient way to populate the landing table with a large number of records. For more information, see “Initial Data Loads and Incremental Loads” on page 302. Note: Data in the landing tables cannot be deleted until after the load process for the base object has been executed and completed successfully.

Managing the Land Process
To manage the land process, refer to the following topics in this documentation:
Task Configuration Topic(s) Chapter 10, “Configuring the Land Process” • • Execution “Configuring Source Systems” on page 348 “Configuring Landing Tables” on page 355

Execution of the land process is external to Siperian Hub and depends on the approach you are using to populate landing tables, as described in “Ways to Populate Landing Tables” on page 294.

294 Siperian Hub Administrator Guide

Stage Process

Task Application Development

Topic(s) If you are using external application(s) to populate landing tables, see the developer documentation for the API used by your application(s).

Stage Process

This section describes concepts and tasks associated with the stage process in Siperian Hub.

About the Stage Process
The stage process transfers data from a populated landing table to the staging table associated with a particular base object or dependent object.

Data is transferred according to mappings that link a source column in the landing table with a target column in the staging table. Mappings also define data cleansing, if any, to perform on the data before it is saved in the target table. If delta detection is enabled (see “Configuring Delta Detection for a Staging Table” on page 401), Siperian Hub detects which records in the landing table are new or updated and then copies only these records, unchanged, to the corresponding RAW table. Otherwise, all records are copied to the target table. Records with obvious problems in

Siperian Hub Processes 295

Stage Process

the data are rejected and stored in a corresponding reject table, which can be inspected after running the stage process (see “Viewing Rejected Records” on page 685). Data from landing tables can be distributed to multiple staging tables. However, each staging table receives data from only one landing table. The stage process prepares data for the load process, described in “Load Process” on page 299, which subsequently loads data from the staging table into a target table—either a base object or a dependent object.

Data Flow of the Stage Process
The following figure shows the stage process in relation to other Siperian Hub processes.

296 Siperian Hub Administrator Guide

Stage Process

Tables Associated With the Stage Process
The following tables in the Hub Store are associated with the stage process.
Type of Table landing table Description Contains data that is copied from a source system. For more information, see “About the Land Process” on page 292 and “About Landing Tables” on page 355. Contains data that was accepted and copied from the landing table during the stage process. For more information, see “About Staging Tables” on page 364. Contains data that was archived from landing tables. Raw data can be configured to get archived based on the number of loads or the duration (specific time interval). For more information, see “Configuring the Audit Trail for a Staging Table” on page 399 and “Configuring Delta Detection for a Staging Table” on page 401. Contains records that Siperian Hub has rejected for a specific reason. Records in these tables will not be loaded into base objects and dependent objects. Data gets rejected automatically during Stage jobs for the following reasons: future date or NULL date in the LAST_UPDATE_DATE column NULL value mapped to the PKEY_SRC_OBJECT of the staging table • duplicates found in PKEY_SRC_OBJECT • invalid value in the HUB_STATE_IND field (for state-enabled base objects only) • duplicate value found in a unique column The rejects table is associated with the staging table (called stagingTableName_REJ). Rejected records can be inspected after running Stage jobs (see “Viewing Rejected Records” on page 685). • •

This section describes concepts and tasks associated with the load process in Siperian Hub. For related tasks, see “Managing the Load Process” on page 316.

About the Load Process
In Siperian Hub, the load process moves data from a staging table to the corresponding target table (the base object or dependent object to which the staging table belongs) in the Hub Store.

o

The load process determines what to do with the data in the staging table based on: • whether the target table is a base object or dependent object • • • • whether a corresponding record already exists in the target table and, if so, whether the record in the staging table has been updated since the load process was last run whether trust is enabled for certain columns (base objects only); if so, the load process calculates trust scores for the cell data whether the data is valid to load; if not, the load process rejects the record instead other configuration settings

Siperian Hub Processes 299

Load Process

Data Flow for the Load Process
The following figure shows the load process in relation to other Siperian Hub processes.

300 Siperian Hub Administrator Guide

Load Process

Tables Associated with the Load Process
In addition to base objects and dependent objects, the following tables in the Hub Store are associated with the load process.
Type of Table staging table Description Contains the data that was accepted and copied from the landing table during the stage process. For more information, see “Stage Process” on page 295 and “About Staging Tables” on page 364. Used for tracking the lineage of data—the source system for each record in the base object. For each source system record that is loaded into the base object, Siperian Hub maintains a record in the cross-reference table that includes: • an identifier for the system that provided the record • the primary key value of that record in the source system • the most recent cell values provided by that system Each base object record will have one or more cross-reference records. For more information, see “Cross-Reference Tables” on page 97. history tables If history is enabled for the base object, and records are updated or inserted, then the load process writes to this information into two tables: • base object history table • cross-reference history table For more information, see “History Tables” on page 100. reject table Contains records from the staging table that the load process has rejected for a specific reason. Rejected records will not be loaded into base objects or dependent objects. The reject table is associated with the staging table (called stagingTableName_REJ). For more information, see “Rejected Records in Load Jobs” on page 314. Rejected records can be inspected after running Load jobs (see “Viewing Rejected Records” on page 685).

cross-reference table

Siperian Hub Processes 301

Load Process

Initial Data Loads and Incremental Loads
The initial data load (IDL) is the very first time that data is loaded into a newly-created, empty base object.

During the initial data load, all records in the staging table are inserted into the base object as new records. For more information, see “Load Inserts” on page 306. Once the initial data load has occurred for a base object, any subsequent load processes are called incremental loads because only new or updated data is loaded into the base object.

Duplicate data is ignored. For more information, see “Run-time Execution Flow of the Load Process” on page 304.

302 Siperian Hub Administrator Guide

Load Process

Trust Settings and Validation Rules
Siperian Hub uses trust and validation rules to help determine the most reliable data.

Trust Settings
If a column in a base object derives its data from multiple source systems, Siperian Hub uses trust to help with comparing the relative reliability of column data from different source systems. For example, the Orders system might be a more reliable source of billing addresses than the Direct Marketing system. Trust is enabled and configured at the column level. For example, you can specify a higher trust level for Customer Name in the Orders system and for Phone Number in the Billing system.

Trust provides a mechanism for measuring the relative confidence factor associated with each cell based on its source system, change history, and other business rules.

Siperian Hub Processes 303

Load Process

Trust takes into account the quality and age of the cell data, and how its reliability decays (decreases) over time. Trust is used to determine survivorship (when two records are consolidated) and whether updates from a source system are sufficiently reliable to update the master record. For more information, see “Survivorship and Order of Precedence” on page 291 and “Configuring Trust for Source Systems” on page 455. Data stewards can manually override a calculated trust setting if they have direct knowledge that a particular value is correct. Data stewards can also enter a value directly into a record in a base object. For more information, see the Siperian Hub Data Steward Guide.

Validation Rules
Trust is often used in conjunction with validation rules, which might downgrade (reduce) trust scores according to configured conditions and actions. For more information, see “Configuring Validation Rules” on page 468. When data meets the criterion specified by the validation rule, then the trust value for that data is downgraded by the percentage specified in the validation rule. For example:
Downgrade trust on First_Name by 50% if Length < 3 Downgrade trust on Address Line 1, City, State, Zip and Valid_ address_ind if Valid_address_ind= ‘False’

If the Reserve Minimum Trust flag is enabled (checked) for a column, then the trust cannot be downgraded below the column’s minimum trust setting.

Run-time Execution Flow of the Load Process
This section provides a detailed explanation of what can occur during the load process based on configured settings as well as characteristics of the data being processed. This section describes the default behavior of the Siperian Hub load process. Alternatively, for incremental loads, you can streamline load, match, and merge processing by loading by RowID, as described in “Loading by RowID” on page 394.

304 Siperian Hub Administrator Guide

Load Process

Loading Records by Batch
The load process handles staging table records in batches. For each base object, the load batch size setting (see “Load Batch Size” on page 103) specifies the number of records to load per batch cycle (default is 1000000). During execution of the load process for a base object, Siperian Hub creates a temporary table (_TLL) for each batch as it cycles through records in the staging table. For example, suppose the staging table contained 250 records to load, and the load batch size were set to 100. During execution, the load process would: • create a TLL table and process the first 100 records • • • drop and create the TLL table and process the second 100 records drop and create the TLL table and process the remaining 50 records drop and create the TLL table and stop executing because the TLL table contained no records

Determining Whether Records Already Exist
During the load process, Siperian Hub first checks to see whether the record has the same primary key as an existing record from the same source system. It compares each record in the staging table with records in the target table to determine whether it already exists in the target table. What occurs next depends on the results of this comparison.

Siperian Hub Processes 305

Load Process

Load Operation load insert load update

Description If a record in the staging table does not already exist in the target table, then Siperian Hub inserts that new record in the target table. If a record in the staging table already exists in the target table, then Siperian Hub takes the appropriate action. A load update occurs if the target table (base object or dependent object) gets updated with data in a record from the staging table. The load process updates a record only if it has changed since the record was last supplied by the source system. Load updates are governed by current Siperian Hub configuration settings and characteristics of the data in each record in the staging table. For example, if Force Update is enabled (see “Forcing Updates in Load Jobs” on page 730), the records will be updated regardless of whether they have already been loaded.

During the load process, load updates are executed first, followed by load inserts.

Load Inserts

What happens during a load insert depends on the target table (base object or dependent object) and other factors.

306 Siperian Hub Administrator Guide

Load Process

Load Inserts and Target Base Objects

To perform a load insert for a record in the staging table: • The load process generates a unique ROWID_OBJECT value for the new record. • The load process performs foreign key lookups and substitutes any foreign key value(s) required to maintain referential integrity. For more information, see “Performing Lookups Needed to Maintain Referential Integrity” on page 312. The load process inserts the record into the base object, and copies into this new record the generated ROWID_OBJECT value (as the primary key for this record in the base object), any foreign key lookup values, and all of the column data from the staging table (except PKEY_SRC_OBJECT)—including null values. The base object may have multiple records for the same object (for example, one record from source system A and another from source system B). Siperian Hub flags both new records as new.

•

Siperian Hub Processes 307

Load Process

•

For each new record in the base object, the load process sets its DIRTY_IND to 1 so that match keys can be regenerated during the tokenization process, as described in “Base Object Records Flagged for Tokenization” on page 323. For each new record in the base object, the load process sets its CONSOLIDATION_IND to 4 (ready for match) so that the new record can matched to other records in the base object. For more information, see “Consolidation Status for Base Object Records” on page 289. The load process inserts a record into the cross-reference table associated with the base object. The load process generates a primary key value for the cross-reference table, then copies into this new record the generated key, an identifier for the source system, and the columns in the staging table (including PKEY_SRC_ OBJECT). For more information, see “Cross-Reference Tables” on page 97. Note: The base object does not contain the primary key value from the source system. Instead, the base object’s primary key is the generated ROWID_OBJECT value. The primary key from the source system (PKEY_SRC_OBJECT) is stored in the cross-reference table instead.

•

•

• •

If history is enabled for the base object (see “History Tables” on page 100), then the load process inserts a record into its history and cross-reference history tables. If trust is enabled for one or more columns in the base object, then the load process also inserts records into control tables that support the trust algorithms, populating the elements of trust and validation rules for each trusted cell with the values used for trust calculations. This information can be used subsequently to calculate trust when needed. For more information, see “Configuring Trust for Source Systems” on page 455 and “Control Tables for Trust-Enabled Columns” on page 457. If Generate Match Tokens on Load is enabled for a base object (see “Generate Match Tokens on Load” on page 104), then the tokenization process is automatically started after the load process completes.

•

Load Inserts and Target Dependent Objects For load inserts into target dependent objects, the load process: • inserts the new record into the dependent object • substitutes any foreign keys required to maintain referential integrity

308 Siperian Hub Administrator Guide

Load Process

Load Updates

What happens during a load update depends on the target table (base object or dependent object) and other factors. Load Updates and Target Base Objects For load updates on target base objects: • By default, for each record in the staging table, the load process compares the value in the LAST_UPDATE_DATE column with the source last update date (SRC_LUD) in the associated cross-reference table.

•

If the record in the staging table has been updated since the last time the record was supplied by the source system, then the load process proceeds with the load update. If the record in the staging table is unchanged since the last time the record was supplied by the source system, then the load process ignores the record (no action is taken) if the dates are the same and trust is not enabled, or rejects the record if it is a duplicate.

•

Administrators can change the default behavior so that the load process bypasses this LAST_UPDATE_DATE check and forces an update of the records regardless of whether the records might have already been loaded. For more information, see “Forcing Updates in Load Jobs” on page 730.

Siperian Hub Processes 309

Load Process

•

The load process performs foreign key lookups and substitutes any foreign key value(s) required to maintain referential integrity. For more information, see “Performing Lookups Needed to Maintain Referential Integrity” on page 312. If the target base object has trust-enabled columns, then the load process: • calculates the trust score for each trust-enabled column in the record to be updated, based on the configured trust settings for this trusted column (as described in “Configuring Trust for Source Systems” on page 455) applies validation rules, if defined, to downgrade trust scores where applicable (see “Configuring Validation Rules” on page 468)

•

•

The load process updates the target record in the base object according to the following rules: • If the trust score for the cell in the staging table record is higher than the trust score in the corresponding cell in the target base object record, then the load process updates the cell in the target record. If the trust score for the cell in the staging table record is lower than the trust score in the corresponding cell in the target base object record, then the load process does not update the cell in the target record. If the trust score for the cell in the staging table record is the same as the trust score in the corresponding cell in the target base object record, or if trust is not enabled for the column, then the cell value in the record with the most recent LAST_UPDATE_DATE wins. • • If the staging table record has a more recent LAST_UPDATE_DATE, then the corresponding cell in the target base object record is updated. If the target record in the base object has a more recent LAST_ UPDATE_DATE, then the cell is not updated.

•

•

For more information, see “Survivorship and Order of Precedence” on page 291. • For each updated record in the base object, the load process sets its DIRTY_IND to 1 so that match keys can be regenerated during the tokenization process. For more information, see “Base Object Records Flagged for Tokenization” on page 323. For each updated record in the base object, the load process sets its CONSOLIDATION_IND to 4 so that the updated record can matched to other

•

310 Siperian Hub Administrator Guide

Load Process

records in the base object. For more information, see “Consolidation Status for Base Object Records” on page 289. • Whenever the load process updates a record in the base object, it also updates the associated record in the cross-reference table (“Cross-Reference Tables” on page 97), history tables (if history is enabled, see “History Tables” on page 100), and other control tables as applicable.

•

If Generate Match Tokens on Load is enabled for a base object (see “Generate Match Tokens on Load” on page 104), then the tokenization process is automatically started after the load process completes.

Siperian Hub Processes

311

Load Process

Load Updates and Target Dependent Objects For load updates with target dependent objects, the load process updates the records in the target dependent object with the values in the staging table without checking the last update date. Note: Data in staging tables from different source systems must have unique keys in order to be loaded into a dependent object. Records coming from different source systems each have their own key that uniquely identifies the record in that source system. Siperian Hub considers any records from the same source system with the same key values to be the same record. Therefore, if a record in the staging table has the same key value as an existing cross-reference record, Siperian Hub performs a load update because the record is considered to exist already in the base object.

Performing Lookups Needed to Maintain Referential Integrity
Regardless of whether the load process is inserting or updating a record, it performs any lookups needed to translate source system foreign keys into Siperian Hub foreign key values using the lookup settings configured for the staging table. For more information, see “Configuring Lookups For Foreign Key Columns” on page 376. Disabling Referential Integrity Constraints During the initial load/updates—or if there is no real-time, concurrent access—you can disable the referential integrity constraints on the base object to improve performance. For more information, see “Allow constraints to be disabled” on page 103. Undefined Lookups If a lookup on a child object is not defined (the lookup table and column were not populated), before you can successfully load data, you must repeat the stage process for the child object prior to executing the load process. For more information, see “Stage Jobs” on page 745 and “Load Jobs” on page 727.

If Allow Null Foreign Key is enabled (selected), then the load process:

The load process permits load inserts and load updates for accepted records only. Rejected records are inserted into the reject table rather than being loaded into the target table. Note: During the initial data load only, when the target base object is empty, the load process allows null foreign keys. For more information, see “Initial Data Loads and Incremental Loads” on page 302.

Siperian Hub Processes 313

Load Process

Rejected Records in Load Jobs
During the load process, records in the staging table might be rejected for the following reasons: • future date or NULL date in the LAST_UPDATE_DATE column • • • • NULL value mapped to the PKEY_SRC_OBJECT of the staging table duplicates found in PKEY_SRC_OBJECT invalid value in the HUB_STATE_IND field (for state-enabled base objects only) invalid or NULL foreign keys, as described in “Allowing Null Foreign Keys” on page 313

Rejected records will not be loaded into base objects or dependent objects. Rejected records can be inspected after running Load jobs (see “Viewing Rejected Records” on page 685). For more information about configuring the behavior delta detection for duplicates and the retention of records in the REJ and RAW tables for a staging table, see “Using Audit Trail and Delta Detection” on page 398. Note: To reject records, the load process requires traceability back to the landing table. If you are loading a record from a staging table and its corresponding record in the associated landing table has been deleted, then the load process does not insert it into the reject table.

314 Siperian Hub Administrator Guide

Load Process

Other Considerations for the Load Process
This section describes other considerations for the load process.

How the Load Process Handles Parent-Child Records
If the child table contains generated keys from the parent table, the load process copies the appropriate primary key value from the parent table into the child table. For example, suppose you had the following data. PARENT TABLE:
PARENT_ID 101 102 FNAME Joe Jane LNAME Smith Smith

CHILD TABLE: has a relationship to the PARENTS PKEY_SRC_OBJECT
ADDRESS 1893 1893 CITY my city my city STATE CA CA FKEY_PARENT 101 102

In this example, you can have a relationship pointing to the ROWID_OBJECT, to PKEY_SRC_OBJECT, or to a unique column for table lookup.

Loading State-Enabled Base Objects
The load process has special considerations when processing records for state-enabled base objects. For more information, see “Rules for Loading Data” on page 221. Note: The load process rejects any record from the staging table that has an invalid value in the HUB_STATE_IND column. For more information, see “About the Hub State Indicator” on page 207.

Siperian Hub Processes 315

Load Process

Generating Match Tokens (Optional)
Tokenizing data prepares it for the match process. In the Schema Manager, when configuring a base object, you can specify whether to generate match tokens immediately after the Load job completes, or to delay tokenizing data until the Match job runs. The setting of the Generate Match Tokens on Load check box determines when tokenization occurs. For more information, see “Match Process” on page 317 and “Generate Match Tokens on Load” on page 104.

This section describes concepts and tasks associated with the match process in Siperian Hub.

About the Match Process
Before records in a base object can be consolidated, Siperian Hub must determine which records are likely duplicates (matches) of each other. The match process uses match rules to: • identify which records in the base object are likely duplicates (identical or similar) • determine which records are sufficiently similar to be consolidated automatically, and which records should be reviewed manually by a data steward prior to consolidation

In Siperian Hub, the match process provides you with two main ways in which to compare records and determine duplicates: • Fuzzy matching is the most common means used in Siperian Hub to match records in base objects. Fuzzy matching looks for sufficient points of similarity between records and makes probabilistic match determinations that consider likely variations in data patterns, such as misspellings, transpositions, the combining or splitting of words, omissions, truncation, phonetic variations, and so on. • Exact matching is less commonly-used because it matches records with identical values in the match column(s). An exact strategy is faster, but an exact match might miss some matches if the data is imperfect.

The best option to choose depends on the characteristics of the data, your knowledge of the data, and your particular match and consolidation requirements. For more information, see “Exact-match and Fuzzy-match Base Objects” on page 320.

Siperian Hub Processes 317

Match Process

During the match process, Siperian Hub compares records in the base object for points of similarity. If the match process finds sufficient points of similarity (identical or similar matches) between two records, indicating that the two records probably are duplicates of each other, then the match process: • populates a match table with ROWID_OBJECT references to matched record pairs, along with the match rule that identified the match, and whether the matched records qualify for automatic consolidation

•

flags those records for consolidation by changing their consolidation indicator to 2 (ready for consolidation), as described in “Consolidation Status for Base Object Records” on page 289

318 Siperian Hub Administrator Guide

Match Process

Match Data Flow
The following figure shows the match process in relation to other Siperian Hub processes.

Siperian Hub Processes 319

Match Process

Key Concepts for the Match Process
This section describes key concepts that apply to the match process.

Match Rules
A match rule defines the criteria by which Siperian Hub determines whether two records in the base object might be duplicates. Siperian Hub supports two types of match rules:
Type Match column rules Description Used to match base object records based on the values in columns you have defined as match columns, such as last name, first name, address1, and address2. This is the most commonly-used method for identifying matches. For more information, see “Configuring Match Columns” on page 515. Used to match records from two systems that use the same primary keys for records. It is uncommon for two different source systems to use identical primary keys. However, when this does occur, primary key matches are quick and very accurate. For more information, see “Configuring Primary Key Match Rules” on page 578.

Primary key match rules

Both kinds of match rules can be used together for the same base object.

Exact-match and Fuzzy-match Base Objects
A base object is configured to use one of the following types of matching:
Type of Base Object exact-match base object fuzzy-match base object Description Can have only exact match columns. For more information, see “Match Column Types” on page 515. Can have both fuzzy match and exact match columns: • • • fuzzy match only exact match only, or some combination of fuzzy and exact match

320 Siperian Hub Administrator Guide

Match Process

The type of base object determines the type of match and the type of match columns you can define. The base object type is determined by the selected match / search strategy for the base object. For more information, see “Match/Search Strategy” on page 493.

Support Tables Used in the Match Process
The match process uses the following support tables:
Table match key table Description Contains the match keys that were generated for all base object records. A match key table uses the following naming convention: C_baseObjectName_STRP where baseObjectName is the root name of the base object. Example: C_PARTY_STRP. For more information, see “Columns in Match Key Tables” on page 325. match table Contains the pairs of matched records in the base object resulting from the execution of the match process on this base object. Match tables use the following naming convention: C_baseObjectName_MTCH where baseObjectName is the root name of the base object. Example: C_PARTY_MTCH. For more information, see “Populating the Match Table with Match Pairs” on page 330. Note: Link-style base objects use a link table (*_LNK) instead. match flag audit table Contains the userID of the user who, in Merge Manager, queued a manual match record for automerging. Match flag audit tables use the following naming convention: C_baseObjectName_FHMA where baseObjectName is the root name of the base object. Used only if Match Flag Audit Table is enabled for this base object, as described in “Match Flag Audit Table” on page 105.

Siperian Hub Processes 321

Match Process

Match Keys and the Tokenization Process
Match keys are strings that encode data in the columns used to identify candidates for matching. Match keys are fixed length, compressed, and encoded values built from a combination of the words and numbers in a name or address such that relevant variations have the same match key value. Match tokens are strings consisting of match keys plus the flattened data from the match columns. The process of generating match tokens is called tokenization. Match tokens are stored in the match key table associated with the base object. For each record in the base object, tokenization stores one or more generated match keys in the match key table. In the match token table, match tokens are stored in the SSA_KEY column, and match tokens are the combination of data stored in the SSA_KEY plus the SSA_DATA columns. For more information, see “Columns in Match Key Tables” on page 325. When to Generate Match Tokens Match keys are maintained independently of the match process. The match process depends on the match keys in the match table being current. Updating match keys can occur: • after the load process (see “Generate Match Tokens on Load” on page 104), when load inserts and load updates • when it is put into the base object using SIF Put or CleansePut requests (see “Generate Match Tokens on Load” on page 104, as well as the Siperian Services Integration Framework Guide and the Siperian Hub Javadoc) when you run the Generate Match Tokens job (see “Generate Match Tokens Jobs” on page 725) at the start of a match job, as described in “Regenerating Match Keys If Needed” on page 329 after consolidating data, as described in “Consolidate Process” on page 335

• • •

322 Siperian Hub Administrator Guide

Match Process

Base Object Records Flagged for Tokenization All base objects have a system column named DIRTY_IND. This dirty indicator identifies when match keys need to be generated for the base object record. Match keys are stored in the match key table. The dirty indicator is one of the following values:
Value 0 1 Meaning Description

Record is up to date Record does not need to be tokenized. Record needs to be This flag is set to 1 when a record has been: tokenized • added (load insert) • updated (load update) • consolidated • edited in the Data Manager

For each record in the base object whose DIRTY_IND is 1, the tokenization process generates match keys, and then resets the DIRTY_IND to 0.

The following figure shows how the DIRTY_IND flag changes during various batch processes:

Key Types and Key Widths in Fuzzy-Match Base Objects For fuzzy-match base objects, match keys are generated based on the following settings:
Property key type Description Identifies the primary type of information being tokenized (Person_Name, Organization_Name, or Address_Part1) for this base object. The match process uses its intelligence about name and address characteristics to generate match keys and conduct searches. Available key types depend on the population set being used, as described in “Population Sets” on page 326. For more information, see “Key Types” on page 521. Determines the thoroughness and speed of the search, the number of possible match candidates returned, and how much disk space the keys consume. Available key widths are Limited, Standard, Extended, and Preferred. For more information, see “Key Widths” on page 522.

key width

Because match keys must be able to overcome errors, variations, and word transpositions in the data, Siperian Hub generates multiple match tokens for each name, address, or organization. The number of keys generated per base object record varies, depending on your data and the match key width. Match Key Distribution and Hot Spots The Match Keys Distribution tab in the Match / Merge Setup Details pane of the Schema Manager allows you to investigate the distribution of match keys in the match

324 Siperian Hub Administrator Guide

Match Process

key table. This tool can assist you with identifying potential hot spots in your data—high concentrations of match keys that could result in overmatching—where the match process generates too many matches, including matches that are not relevant. For more information, see “Investigating the Distribution of Match Keys” on page 583. Example Match Keys The match keys that are generated depend on your configured match settings and characteristics of the data in the base object. The following example shows match keys generated from strings using a fuzzy match / search strategy:
String in Record BETH O'BRIEN BETH O'BRIEN BETH O'BRIEN LIZ O'BRIEN LIZ O'BRIEN LIZ O'BRIEN Generated Match Key MMU$?/$PCOG$$$$ VL/IEFLM PCOG$$$$ SXOG$$$VL/IEFLM

In this example, the strings BETH O'BRIEN and LIZ O'BRIEN (keys #3 and 5 in the example) have the same match token values. The match process would consider these to be match candidates while searching for match candidates during the match process. Columns in Match Key Tables The match key table has the following system columns.
Column Name ROWID_OBJECT SSA_KEY Data Type (Size) CHAR (14) CHAR (8) Description Identifies the record for which this match key was generated. Generated match token for this record.

Siperian Hub Processes 325

Match Process

Column Name SSA_DATA

Data Type (Size)

Description

VARCHAR2 Concatenated, plain text string representing the (500) source data from all of the match columns defined in the base object—not just the match key stored in the SSA_KEY column.

Tokenization Ratio You can configure the match process to repeat the tokenization process whenever the percentage of changed records exceeds the specified ratio, which is configured as an advanced property in the base object. For more information, see “Complete Tokenize Ratio” on page 102.

Population Sets
For base objects with the fuzzy match/search strategy, the match process uses standard population sets to account for national, regional, and language differences. The population set affects how the match process handles tokenization, the match / search strategy, and match purposes. For more information, see “Fuzzy Population” on page 494. A population set encapsulates intelligence about name, address, and other identification information that is typical for a given population. For example, different countries use different address formats, such as the placement of street numbers and street names, location of postal codes, and so on. Similarly, different regions have different distributions for surnames—the surname “Smith” is quite common in the United States population, for example, but not so common for other parts of the world. Population sets improve match accuracy by accommodating for the variations and errors that are likely to appear in data for a particular population. For more information, see “Configuring Match Settings for Non-US Populations” on page 941.

Matching for Duplicate Data
The match for duplicate data functionality is used to generate matches for duplicates of all non-system base object columns. These matches are generated when there are more

326 Siperian Hub Administrator Guide

Match Process

than a set number of occurrences of complete duplicates on the base object columns (see “Duplicate Match Threshold” on page 103). For most data, the optimal value is 2. Although the matches are generated, the consolidation indicator (see “Consolidation Indicator” on page 289) remains at 4 (unconsolidated) for those records, so that they can be later matched using the standard match rules. Note: The Match for Duplicate Data job is visible in the Batch Viewer if the threshold is set above 1 and there are no NON_EQUAL match rules defined on the corresponding base object. For more information, see “Match for Duplicate Data Jobs” on page 740.

Build Match Groups and Transitive Matches
The Build Match Group (BMG) process removes redundant matching in advance of the consolidate process. For example, suppose a base object had the following match pairs: • record 1 matches to record 2 • • record 2 matches to record 3 record 3 matches to record 4

After running the match process and creating build match groups, and before the running consolidation process, you might see the following records: • record 2 matches to record 1 • • record 3 matches to record 1 record 4 matches to record 1

In this example, there was no explicit rule that matched 4 to 1. Instead, the match was made indirectly due to the behavior of other matches (record 1 matched to 2, 2 matched to 3, and 3 matched to 4). An indirect matching is also known as a transitive match. In the Merge Manager and Data Manager, you can display the complete match history to expose the details of transitive matches.

Siperian Hub Processes 327

Match Process

Maximum Matches for Manual Consolidation
You can configure the maximum number of manual matches to process during batch jobs. Setting a limit helps prevent data stewards from being overwhelmed with thousands of manual consolidations to process. Once this limit is reached, the match process stops running run until the number of records ready for manual consolidation has been reduced. For more information, see “Maximum Matches for Manual Consolidation” on page 490 and “Consolidate Process” on page 335.

External Match Jobs
Siperian Hub provides a way to match new data with an existing base object without actually loading the data into the base object. Rather than run an entire Match job, you can run the External Match job instead to test for matches and inspect the results. For more information, see “External Match Jobs” on page 719.

Distributed Cleanse Match Servers
For your Siperian Hub implementation, you can increase the throughput of the match process by running multiple Cleanse Match Servers in parallel. For more information, see “Configuring Cleanse Match Servers” on page 407 and the material about distributed Cleanse Match Servers in the Siperian Hub Installation Guide for your platform.

Handling Application Server or Database Server Failures
When running very large Match jobs with large match batch sizes, if there is a failure of the application server or the database, you must re-run the entire batch. Match batches are a unit. There are no incremental checkpoints. To address this, if you think there might be a database or application server failure, set your match batch sizes smaller to reduce the amount of time that will be spent re-running your match batches. For more information, see “Number of Rows per Match Job Batch Cycle” on page 491 and “Match Jobs” on page 734.

328 Siperian Hub Administrator Guide

Match Process

Run-Time Execution Flow of the Match Process
This section describes the overall sequence of activities that occur during the execution of match process. The following figure provides an overview of the flow, which is determined by the configured match/search strategy for the base object:

Cycles for Merge and Auto Match and Merge Jobs
The Merge job executes the match process for a single match batch (see “Flagging the Match Batch” on page 329). The Auto Match and Merge job cycles repeatedly until there are no more records to match (no more base object records with a CONSOLIDATION_IND = 4).

Base Object Records Excluded from the Match Process
The following base object records are ignored during the match process: • Records with a CONSOLIDATION_IND of 9 (on hold). • Records with a PENDING or DELETED status. PENDING records can be included if explicitly enabled according to the instructions in “Enabling Match on Pending Records” on page 214.

Regenerating Match Keys If Needed
When the match process (such as a Match or Auto Match and Merge job) executes, it first checks to determine whether match keys need to be generated for any records in the base object and, if so, generates the match keys and updates the match key table. Match keys will be generated if the c_repos_table.STRIP_INCOMPLETE_IND flag for the base object is 1, or if any base object records have a DIRTY_IND=1 (see “Base Object Records Flagged for Tokenization” on page 323). For more information, see “Match Keys and the Tokenization Process” on page 322.

Flagging the Match Batch
The match process cycles through a series of batches until there are no more base object records to process. It matches a subset of base object records (the match batch)

Siperian Hub Processes 329

Match Process

against all the records available for matching in the base object (the match pool). The size of the match batch is determined by the Number of Rows per Match Job Batch Cycle setting (“Number of Rows per Match Job Batch Cycle” on page 491). For the match batch, the match process retrieves, in no specific order, base object records that meet the following conditions: • the record has a CONSOLIDATION_IND value of 4 (ready for match) The load process sets the CONSOLIDATION_IND to 4 for any record that is new (load insert) or updated (load update). • the record qualifies based on rule set filtering, if configured (see “Enable Filtering” on page 536 and “Filtering SQL” on page 536)

Internally, the match process changes the CONSOLIDATION_IND=3 for any records in the match batch. At the end, the match process changes this setting to CONSOLIDATION_IND=2 (match is complete).

Applying Match Rules and Generating Matches
In this step, the match process applies the configured match rules to the match candidates. The match process executes the match rules one at a time, in the configured order. The match process executes exact-match rules and exact match-column rules first, then it executes fuzzy-match rules. For a match to be declared: • all match columns in a match rule must pass • only one match rule needs to pass

The match process continues executing the match rules until there is a match or there are no more rules to execute.

Populating the Match Table with Match Pairs
When all of the records in the match batch have been processed, the match process adds all of the matches for that group to the match table and changes CONSOLIDATION_IND=2 for the records in the match batch.

330 Siperian Hub Administrator Guide

Match Process

Match Pairs The match process populates a match table for that base object. Each row in the match table represents a pair of matched records in the base object. The match table stores the ROWID_OBJECT values for each pair of matched records, as well as the identifier for the match rule that resulted in the match, an automerge indicator, and other information.

Columns in the Match Table Match (_MTCH) tables have the following columns:

Description Identifies one of the records in the matched pair. Identifies the record that matched the record specified in ROWID_OBJECT. Identifies the original record that was matched to (prior to merge). Indicates the direction of the original match. One of the following values: • • Zero (0): ROWID_OBJECT matched ROWID_OBJECT_MATCHED. One (1): ROWID_OBJECT_ MATCHED matched ROWID_ OBJECT

ROWID_USER ROWID_MATCH_ RULE AUTOMERGE_IND

CHAR (14) CHAR (14) NUMBER (38)

User who executed the match process. Identifies the match rule that was used to match the two records. Specifies whether a record qualifies for automatic consolidation during the consolidate process. One of the following values: • Zero (0): Record does not qualify for automatic consolidation. • One (1): Record does qualify for automatic consolidation. The Automerge and Autolink jobs processes any records with an AUTOMERGE_IND of 1. For more information, see “Automerge Jobs” on page 717 and “Autolink Jobs” on page 715.

CREATOR CREATE_DATE UPDATED_BY

VARCHAR2 (50) DATE VARCHAR2 (50)

User or process responsible for creating the record. Date on which the record was created. User or process responsible for the most recent update to the record. Date on which the record was last updated.

LAST_UPDATE_DATE DATE

332 Siperian Hub Administrator Guide

Match Process

Flagging Matched Records for Automatic or Manual Consolidation Match rules also determine how matched records are consolidated: automatically or manually.
Type of Consolidation automatic consolidation Description Identifies records in the base object that can be consolidated automatically, without manual intervention. For more information, see “Automerge Jobs” on page 717. Identifies records in the base object that have enough points of similarity to warrant attention from a data steward, but not enough points of similarity to automatically consolidate them. The data steward uses the Merge Manager to review and manually merge records. For more information, see the Siperian Hub Data Steward Guide.

manual consolidation

For more information, see “Specifying Consolidation Options for Matched Records” on page 543.

This section describes concepts and tasks associated with the consolidate process in Siperian Hub.

About the Consolidate Process
After match pairs have been identified in the match process, consolidation is the process of consolidating data from matched records into a single, master record.

Siperian Hub Processes 335

Consolidate Process

The following figure shows cell data in records from three different source systems being consolidated into a single master record.

Consolidating Records Automatically or Manually
As described in “Flagging Matched Records for Automatic or Manual Consolidation” on page 333, match rules set the AUTOMERGE_IND column in the match table to specify how matched records are consolidated: automatically or manually. • Records flagged for manual consolidation are reviewed by a data steward using the Merge Manager tool. For more information, see the Siperian Hub Data Steward Guide. • Records flagged for automatic consolidation are automatically merged (see “Automerge Jobs” on page 717). Alternately, you can run the automatch-and-merge job (see “Auto Match and Merge Jobs” on page 716) for a base object, which calls the match and then automerge jobs repeatedly, until either all records in the base object have been checked for matches, or the maximum number of records for manual consolidation is reached.

336 Siperian Hub Administrator Guide

Consolidate Process

Consolidate Data Flow
The following figure shows the consolidate process in relation to other Siperian Hub processes.

Traceability
The goal in Siperian Hub is to identify and eliminate all duplicate data and to merge or link them together into a single, consolidated record while maintaining full traceability. Traceability is Siperian Hub functionality that maintains knowledge about which systems—and which records from those systems—contributed to consolidated records. Siperian Hub maintains traceability using cross-reference and history tables.

Siperian Hub Processes 337

Consolidate Process

Key Configuration Settings for the Consolidate Process
The following configurable settings affect the consolidate process.
Option base object style Description Determines whether the consolidate process using merging or linking. For more information, see “Base Object Style” on page 106 and “Consolidation Options” on page 339. Allows you to specify source systems as immutable, meaning that records from that source system will be accepted as unique and, once a record from that source has been fully consolidated, it will not be changed subsequently. For more information, see “Immutable Rowid Object” on page 594. Allows you to specify source systems as distinct, meaning that the data from that system gets inserted into the base object without being consolidated. For more information, see “Distinct Systems” on page 595. Allows you to enable cascade unmerging for child base objects and to specify what happens if records in the parent base object are unmerged. For more information, see “Unmerge Child When Parent Unmerges (Cascade Unmerge)” on page 597. For two base objects in a parent-child relationship, if enabled on the child base object, child records are resubmitted for the match process if parent records are consolidated. For more information, see “Requeue On Parent Merge” on page 104.

By default, base object consolidation is physically saved, so merging is the default behavior. For more information, see “Base Object Style” on page 106. Merging combines two or more records in a base object table. Depending on the degree of similarity between the two records, merging is done automatically or manually. • Records that are definite matches are automatically merged (automerge process). For more information, see “Automerge Jobs” on page 717. • Records that are close but not definite matches are queued for manual review (manual merge process) by a data steward in the Merge Manager tool. The data steward inspects the candidate matches and selectively chooses matches that should be merged. Manual merge match rules are configured to identify close matches. For more information, see “Manual Merge Jobs” on page 732 and, for the Merge Manager, see the Siperian Hub Data Steward Guide. Siperian Hub queues all other records for manual review by a data steward in the Merge Manager tool.

•

Match rules are configured to identify definite matches for automerging and close matches for manual merging. To allow Siperian Hub to automatically change the state of such records to Consolidated (thus removing them from the Data Steward’s queue), you can check (select) the Accept all other unmatched rows as unique check box. For more information, see “Accept All Unmatched Rows as Unique” on page 492.

Siperian Hub Processes 339

Consolidate Process

Best Version of the Truth
For a base object, the best version of the truth (sometimes abbreviated as BVT) is a record that has been consolidated with the best cells of data from the source records. The precise definition depends on the base object style: • For merge-style base objects, the base object record is the BVT record, and is built by consolidating with the most-trustworthy cell values from the corresponding source records. • For link-style base objects, the BVT Snapshot job will build the BVT record(s) by consolidating with the most-trustworthy cell values from the corresponding linked base object records and return to the requestor a snapshot for consumption.

Consolidation and Workflow Integration
For state-enabled base objects, consolidation behavior is affected by the current system state of records in the base object. For example, only ACTIVE records can be automatically consolidated—records with a PENDING or DELETED system state cannot be. To understand the implications of system states during consolidation, refer to the following topics: • Chapter 7, “State Management,”especially “State Transition Rules for State Management” on page 208 and “Hub States and Base Object Record Value Survivorship” on page 211 • “Consolidating Data” in the Siperian Hub Data Steward Guide

This section describes concepts and tasks associated with the publish process in Siperian Hub.

About the Publish Process
This section describes how Siperian Hub integrates with external systems by generating XML messages about data changes in the Hub Store and publishing these messages to an outbound Java Messaging System (JMS) queue—also known as a message queue in the Hub Console.

Other external systems, processes, or applications can listen on the JMS message queue, retrieve the XML messages, and process them accordingly. Siperian Hub supports two JMS models: • point-to-point—specific destination for a target external system • publish/subscribe: point-to-point to an Enterprise Service Bus (ESB), then publish/subscribe from the ESB to other systems.

342 Siperian Hub Administrator Guide

Publish Process

Using the Publish Process is Optional
Siperian Hub implementations use the publish process in support of stated business and technical requirements. However, not all organizations will take advantage of this functionality, and its use in Siperian Hub implementations is optional.

Publish Process is Part of the Siperian Hub Distribution Flow
The processes previously described in this chapter—land, stage, load, match, and consolidate—are all associated with reconciliation, which is the main inbound flow for Siperian Hub. With reconciliation, Siperian Hub receives data from one or more source systems, cleanses the data if applicable, and then reconciles “multiple versions of the truth” to arrive at the master record—the best version of the truth—for that entity. In contrast, the publish process belongs to the main Siperian Hub outbound flow—distribution. Once the master record is established or updated for a given entity, Siperian Hub can then (optionally) distribute the master record data to other applications or databases. For an introduction to reconciliation and distribution, see the Siperian Hub Overview. In another scenario, data changes can be sent to the Activity Manager Rules queue so that the data change can be evaluated against user-defined rules.

Publish Process Executes By Message Triggers
The land, stage, load, match, and consolidate processes work with batches of records and are executed as batch jobs or stored procedures. In contrast, the publish process is executed as the result of a message trigger that executes when a data change occurs in the Hub Store. The message trigger creates an XML message that gets published on a JMS message queue.

Siperian Hub Processes 343

Publish Process

Outbound JMS Message Queues
Siperian Hub use an outbound message queue as a communication channel to feed data changes back to external systems. Siperian supports embedded message queues, which uses the JMS providers that come with application servers. An embedded message queue uses the JNDI name of ConnectionFactory and the name of the JMS queue to connect with. It requires those JNDI names that have been set up by the application server. The Hub Console allows you to register message queue servers and message queues that have already been configured in the application server environment.

ORS-specific XML Message Schemas
XML messages are created using an ORS-specific schema file (<ors-name>-siperian-mrm-event.xsd) that is based on a common XML schema (siperian-mrm-events.xsd). You use the JMS Event Schema Manager to generate this ORS-specific schema. This is a required task for setting up the publish process. For more information, see “Generating and Deploying ORS-specific Schemas” on page 827.

344 Siperian Hub Administrator Guide

Publish Process

Run-time Flow of the Publish Process
The following figure shows the run-time flow of the publish process.

o

Siperian Hub Processes 345

Publish Process

In this scenario: 1. A batch load or a real-time SIF API request (SIF put or cleanse_put request) may result in an insert or update on a base object. You can configure a message rule to control data going to the C_REPOS_MQ_ DATA_CHANGE table.
2.

Hub Server polls data from C_REPOS_MQ_DATA_CHANGE table at regular intervals. For data that has not been sent, Hub Server constructs an XML message based on the data and sends it to the outbound queue configured for the message queue. It is the external application's responsibility to retrieve the message from the outbound queue and process it.

Siperian Hub publishes an XML message to an outbound message queue whenever a messages trigger is fired. You do not need to explicitly execute a batch job from the Batch Viewer or Batch Group tool. To monitor run-time activity for message queues using the Audit Manager tool in the Hub Console, see “Auditing Message Queues” on page 928.

Application Development

Siperian Services Integration Framework Guide

346 Siperian Hub Administrator Guide

10
Configuring the Land Process

This chapter explains how to configure the land process for your Siperian Hub implementation. For an introduction, see “Land Process” on page 292.

Before You Begin
Before you begin to configure the land process, you must have completed the following tasks: • Installed Siperian Hub and created the Hub Store according to the instructions in Siperian Hub Installation Guide • • Built the schema, including defining base objects, according to the instructions Chapter 5, “Building the Schema” Learned about the land process described in “Land Process” on page 292

Configuration Tasks for the Land Process
To set up the land process for your Siperian Hub implementation, you must complete the following tasks in the Hub Console: • “Configuring Source Systems” on page 348 • “Configuring Landing Tables” on page 355

Configuring Source Systems
This section describes how to define source systems for your Siperian Hub implementation. For an introduction, see “Land Process” on page 292.

About Source Systems
Source systems are external applications or systems that provide data to Siperian Hub. In order to manage input from various source systems, Siperian Hub requires a unique internal name for each source system. You use the Systems and Trust tool in the Model workbench to define source systems for your Siperian Hub implementation.

Configuring Trust for Source Systems
If multiple source systems contribute data for the same column in a base object, you can configure trust on a column-by-column basis to specify which source system(s) are more reliable providers of data (relative to other source systems) for that column. Trust

348 Siperian Hub Administrator Guide

Configuring Source Systems

is used to determine survivorship when two records are consolidated, and whether updates from a source system are sufficiently reliable to update the “best version of the truth” record. For more information, see “Configuring Trust for Source Systems” on page 455.

Administration Source System
Siperian Hub uses an administration source system for manual trust overrides and data edits from the Data Manager or Merge Manager tools, which are described in the Siperian Hub Data Steward Guide. This administration source system can contribute data to any trust-enabled column. The administration source system is named Admin by default, but you can optionally change its name according to the instructions in “Editing Source System Properties” on page 353.

Siperian System Repository Table
The source systems that you define in the Systems and Trust tool are stored in a special public Siperian Hub repository table (C_REPOS_SYSTEM, with a display name of MRM System). This table is visible in the Schema Manager if the Show System Tables option is selected (for more information, see “Changing the Item View” on page 39). C_REPOS_SYSTEM can also be used in packages, as described in “Configuring Packages” on page 196. Warning: The C_REPOS_SYSTEM table contains Siperian Hub metadata. As with any Siperian Hub systems tables, you should never alter the structure of, or data in, the C_REPOS_SYSTEM table. Doing so causes Siperian Hub to behave unpredictably and can result in data loss.

Configuring the Land Process

349

Configuring Source Systems

Starting the Systems and Trust Tool
To start the Systems and Trust tool: • In the Hub Console, expand the Model workbench, and then click Systems and Trust. The Hub Console displays the Systems and Trust tool, as shown in the following example.

Navigation Pane

Properties Pane

350 Siperian Hub Administrator Guide

Configuring Source Systems

The Systems and Trust tool displays the following panes:
Pane Navigation Description Systems: List of every source system that contributes data to Siperian Hub, including the administration source system described in “Administration Source System” on page 349. Trust: Expand the tree to display: • base objects containing one or more trust-enabled columns • trust-enabled columns (only) For more information about configuring trust for base object columns, see “Configuring Trust for Source Systems” on page 455. Properties Properties for the selected source system. Trust settings for the base object column if the base object column is selected.

Source System Properties
A source system definition in Siperian Hub has the following properties.

Property Name Primary Key Description

Description Unique, descriptive name for this source system. Primary key for this source system. Unique identifier for this system in the ROWID_SYSTEM column of C_REPOS_SYSTEM. Read only. Optional description for this source system.

Configuring the Land Process

351

Configuring Source Systems

Adding Source Systems
Using the Systems and Trust tool, you need to define each source system that will contribute data to your Siperian Hub implementation. To add a source system definition: 1. Start the Systems and Trust tool according to the instructions in “Starting the Systems and Trust Tool” on page 350.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click in the list of source systems and choose Add System. The Systems and Trust tool displays the New System dialog.

3.

4.

Specify the source system properties. For more information, see “Source System Properties” on page 351. Click OK. The Systems and Trust tool displays the newly-added source system in the list of source systems. Note: When you add a source system, Hub Store uses the first 14 characters of the system name (in all uppercase letters) as its primary key (ROWID_SYSTEM value in C_REPOS_SYSTEM).

5.

352 Siperian Hub Administrator Guide

Configuring Source Systems

Editing Source System Properties
You can rename any source system, including the administration system (see “Administration Source System” on page 349). You can change the display name used in the Hub Console to identify this source system—renaming it has no effect outside of the Hub Console. Note: If this source system has already contributed data to your Siperian Hub implementation, Siperian Hub continues to track the lineage (history) of data from this source system even after you have renamed it. To edit source system properties: 1. Start the Systems and Trust tool according to the instructions in “Starting the Systems and Trust Tool” on page 350.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the list of source systems, select the source system that you want to configure. The screen refreshes, showing the Edit button next to the name and description fields for the selected source system.

3.

4.

Change any of the editable properties. For more information, see “Source System Properties” on page 351. To change trust settings for a source system, see “Configuring Trust for Source Systems” on page 455. Click the button to save your changes.

5.

6.

Configuring the Land Process

353

Configuring Source Systems

Removing Source Systems
You can remove any source system except: • the administration system (see “Administration Source System” on page 349) • any source system that has contributed data to a staging table after the stage process has been run You can remove a source system only before the stage process has copied data from an associated landing to a staging table. • any source system that is configured as a source for a base object (meaning that a staging table associated with a base object points to the source system)

Note: Removing a source system deletes only the source system definition in the Hub Console—it has no effect outside of Siperian Hub. To remove a source system: 1. Start the Systems and Trust tool according to the instructions in “Starting the Systems and Trust Tool” on page 350.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the list of source systems, right-click the source system that you want to remove, and choose Remove System. The Systems and Trust tool prompts you to confirm deletion. Click Yes. The Systems and Trust tool removes the source system from the list, along with any metadata associated with this source system.

3.

4.

354 Siperian Hub Administrator Guide

Configuring Landing Tables

Configuring Landing Tables
This section describes how to configure landing tables in your Siperian Hub implementation. For an introduction, see “Land Process” on page 292.

About Landing Tables
A landing table provides intermediate storage in the flow of data from source systems into Siperian Hub. In effect, landing tables are “where data lands” from source systems into the Hub Store. You use the Schema Manager in the Model workbench to define landing tables. The manner in which source systems populate landing tables with data is entirely external to Siperian Hub. The data model you use for collecting data in landing tables from various source systems is also external to Siperian Hub. One source system could populate multiple landing tables. A single landing table could receive data from different source systems. The data model you use is entirely up to your particular implementation requirements. Inside Siperian Hub, however, landing tables are mapped to staging tables, as described in “Mapping Columns Between Landing and Staging Tables” on page 380. It is in the staging table—mapped to a landing table—where the source system supplying the data to the base object is identified. During the load process, Siperian Hub copies data from a landing table to a target staging table, tags the data with the source system identification, and optionally cleanses data in the process. A landing table can be mapped to one or more staging tables. A staging table is mapped to only one landing table. As described in “Ways to Populate Landing Tables” on page 294, landing tables are populated using batch or real-time approaches that are external to Siperian Hub. After a landing table is populated, the stage process pulls data from the landing tables, further cleanses the data if appropriate, and then populates the appropriate staging tables. For more information, see “Stage Process” on page 295.

Configuring the Land Process

355

Configuring Landing Tables

Landing Table Columns
Landing tables have two types of columns:
Column Type system columns user-defined columns Description Columns that are automatically created and maintained by the Schema Manager. Columns that have been added by users according to the instructions in “Configuring Columns in Tables” on page 125.

Landing tables have only one system column.
Physical Name Data Type Description Date on which the record was last updated in the source system (for base objects, this will populate LAST_UPDATE_DATE and SRC_LUD in the cross-reference table, and may also populate LAST_UPDATE_DATE on the base object, depending on trust).

LAST_UPDATE_DATE DATE

All other columns in the landing table are user-defined columns. Note: If the source system table has a multiple-column key, concatenate these columns to produce a single unique VARCHAR value for the primary key column.

356 Siperian Hub Administrator Guide

Configuring Landing Tables

Landing Table Properties
Landing tables have the following properties.
Property Item Type Display Name Physical Name Description Type of table that you are adding. Select Landing Table. Name of this landing table as it will be displayed in the Hub Console. Actual name of the landing table in the database. Siperian Hub will suggest a physical name for the landing table based on the display name that you enter. Name of the data tablespace for this landing table. For more information, see the Siperian Hub Installation Guide for your platform. Name of the index tablespace for this landing table. For more information, see the Siperian Hub Installation Guide for your platform. Description of this landing table. Date and time when this landing table was created. Specifies whether this landing table contains the full data set from the source system, or only updates. • If selected (default), indicates that this landing table contains the full set of data from the source system (such as for the initial data load). When this check box is enabled, you can configure Siperian Hub’s delta detection feature (see “Configuring Delta Detection for a Staging Table” on page 401) so that, during the stage process, only changed records are copied to the staging table. • If not selected, indicates that this landing table contains only changed data from the source system (such as for incremental loads). In this case, Siperian Hub assumes that you filtered out unchanged records before populating the landing table. Therefore, the stage process inserts all records from the landing table directly into the staging table. When this check box is enabled, Siperian Hub’s delta detection feature is not available. Note: You can change this property only when editing the source system properties, as described in “Editing Source System Properties” on page 353.

Adding Landing Tables
To add a landing table: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the Landing Tables node.

3.

4.

Right-click the Landing Tables node and choose Add Item.

358 Siperian Hub Administrator Guide

Configuring Landing Tables

The Schema Manager displays Add Table dialog box.

5.

Specify the properties (described in “Landing Table Properties” on page 357) for this new landing table. Click OK. The Schema Manager creates the new landing table in the Operational Record Store (ORS), along with support tables, and then adds the new landing table to the schema tree.

6.

7.

Configure the columns for your landing table according to the instructions in “Configuring Columns in Tables” on page 125. If you want to configure this landing table to contain only changed data from the source system (Contains Full Data Set), edit the landing table properties according to the instructions in “Editing Landing Table Properties” on page 360.

8.

Configuring the Land Process

359

Configuring Landing Tables

Editing Landing Table Properties
To edit properties in a landing table: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the landing table that you want to edit. The Schema Manager displays the Landing Table Identity pane for the selected table.

3.

4.

Change the landing table properties you want. For more information, see “Landing Table Properties” on page 357. Click the button to save your changes. Change the column configuration for your landing table, if you want, according to the instructions in “Configuring Columns in Tables” on page 125.

5. 6.

360 Siperian Hub Administrator Guide

Configuring Landing Tables

Removing Landing Tables
To remove a landing table: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the Landing Tables node. Right-click the landing table that you want to remove, and choose Remove. The Schema Manager prompts you to confirm deletion. Choose Yes. The Schema Manager drops the landing table from the database, deletes any mappings between this landing table and any staging table (but does not delete the staging table), and removes the deleted landing table from the schema tree.

3. 4.

5.

Configuring the Land Process

361

Configuring Landing Tables

362 Siperian Hub Administrator Guide

11
Configuring the Stage Process

This chapter explains how to configure the data staging process for your Siperian Hub implementation. For an introduction, see “Stage Process” on page 295. In addition, to learn about cleansing data during the data staging process, see Chapter 12, “Configuring Data Cleansing.”

Before You Begin
Before you begin to configure staging data, you must have completed the following tasks: • Installed Siperian Hub and created the Hub Store according to the instructions in Siperian Hub Installation Guide • • Built the schema according to the instructions Chapter 5, “Building the Schema” Learn about the stage process described in “Stage Process” on page 295.

Configuration Tasks for the Stage Process
In addition to the prerequisites described in “Before You Begin” on page 364, to set up the process of staging data in your Siperian Hub implementation, you must complete the following tasks in the Hub Console: • “Configuring Staging Tables” on page 364 • • “Mapping Columns Between Landing and Staging Tables” on page 380 “Configuring Data Cleansing” on page 405, if you plan to use Siperian Hub internal cleansing to normalize your data.

About Staging Tables
A staging table provides temporary, intermediate storage in the flow of data from landing tables into base objects and dependent objects via load jobs (see “Load Jobs” on page 727). Staging tables: • contain data from one source system for one table in the Hub Store • • are populated from landing tables by stage jobs (see “Stage Jobs” on page 745) can be created for base objects and dependent objects

364 Siperian Hub Administrator Guide

Configuring Staging Tables

The structure of a staging table is directly based on the structure of the target object that will contain the consolidated data. You use the Schema Manager in the Model workbench to configure staging tables. Note: You must have at least one source system defined before you can define a staging table. For more information, see “Configuring Source Systems” on page 348.

Staging Table Columns
Staging tables have two types of columns:
Column Type system columns user-defined columns Description Columns that are automatically created and maintained by the Schema Manager. Columns that have been added by users. To add columns to a staging table, you select from a list of columns that are already defined in the base object or dependent object associated with the staging table. For more information, see “Adding Staging Tables” on page 371 and “Configuring Columns in Tables” on page 125.

Staging tables have the following system columns.
Physical Name PKEY_SRC_OBJECT Data Type (Size) VARCHAR (255) Description Primary key from the source system. This must be unique. If the source record does not have a single unique column, then concatenate the values from multiple columns to uniquely identify the record. Display name is Pkey Src Object (or, in some places, Primary Key from Source System). ROWID_OBJECT DELETED_IND DELETED_DATE DELETED_BY CHAR (14) INT DATE VARCHAR (50) Primary key. Unique value assigned by Siperian during the stage process. Reserved for future use. Reserved for future use. Reserved for future use.

Configuring the Stage Process

365

Configuring Staging Tables

Physical Name LAST_UPDATE_DATE

Data Type (Size) DATE

Description Date on which the record was last updated in the source system. For base objects, this will populate LAST_ UPDATE_DATE and SRC_LUD in the cross-reference table, and (depending on trust settings) may also populate LAST_UPDATE_DATE on the base object. User or process responsible for the most recent update. Date on which the record was created. User or process responsible for creating the record. Database internal Rowid column that is used to uniquely trace back records to the Landing table from Staging. For state-enabled base objects only. Integer value indicating the state of this record. Valid values are: • 0=Pending • 1=Active (Default) • -1=Deleted For details, see “About the Hub State Indicator” on page 207.

UPDATED_BY CREATE_DATE CREATOR SRC_ROWID

VARCHAR (50) DATE VARCHAR (50) VARCHAR (30)

HUB_STATE_IND

INT

Staging tables must be based on the columns provided by the source system for the target base object or dependent object for which the staging table is defined, even if the landing tables are shared across multiple source systems. If you do not make the column on staging tables source-specific, then you create unnecessary trust and validation requirements. Trust is a powerful mechanism, but it carries performance overhead. Use trust where it is appropriate and necessary, but not where the most recent cell value will suffice for the surviving record.

366 Siperian Hub Administrator Guide

Configuring Staging Tables

If you limit the columns in the staging tables to the columns actually provided by the source systems, then you can restrict the trust columns to those that come from two or more staging tables. Use this approach instead of treating every column as if it comes from every source, which would mean needing to add trust for every column, and then validation rules to downgrade the trust on null values for all of the sources that do not provide values for the columns. More trust columns and validation rules obviously affect the load and the merge processes. Also, the more trusted columns, the longer will the update statements be for the control table. Bear in mind that Oracle and DB2 have a 32K limit on the size of the SQL buffer for SQL statements. For this reason, more than 40 trust columns result in a horizontal split in the update of the control table—MRM will try to update only 40 columns at a time.

Staging Table Properties
Staging tables have the following properties.
Property Staging Identity Display Name Physical Name Name of this staging table as it will be displayed in the Hub Console. Actual name of the staging table in the database. Siperian Hub will suggest a physical name for the staging table based on the display name that you enter. Select the source system for this data. For more information, see “Configuring Source Systems” on page 348. Copy key values from the source system rather than using Siperian Hub’s internally-generated key values. Applies to staging tables associated with base objects only (not with dependent objects). To learn more, see “Preserving Source System Keys” on page 368. Description

System Preserve Source System Keys

Highest Reserved Key Specify the amount by which the key is increased after the first load. Visible only if the Preserve Source System Key checkbox is selected. To learn more, see “Specifying the Highest Reserved Key” on page 369. Data Tablespace Name of the data tablespace for this staging table. For more information, see the Siperian Hub Installation Guide for your platform.

Configuring the Stage Process

367

Configuring Staging Tables

Property Index Tablespace Description Cell Update

Description Name of the index tablespace for this staging table. For more information, see the Siperian Hub Installation Guide for your platform. Description of this staging table. Determines whether Siperian Hub updates the cell in the target table if the value in the incoming record from the staging table is the same. For more information, see “Enabling Cell Update” on page 369. Columns in this staging table. For more information, see “Configuring Columns in Tables” on page 125. Configurable after mappings between landing and staging tables have been defined. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380. If enabled, retains the history of the data in the RAW table based on the number of loads and timestamps. For more information, see “Configuring the Audit Trail for a Staging Table” on page 399. If enabled, Siperian Hub processes only new or changed records and ignores unchanged records. For more information, see “Configuring Delta Detection for a Staging Table” on page 401.

Columns Audit Trail and Delta Detection Audit Trail

Delta Detection

Preserving Source System Keys
By default, this option is not enabled. During Siperian Hub stage jobs (see “Stage Jobs” on page 745), for each inbound record of data, Siperian Hub generates an internal key that it inserts in the ROWID_OBJECT column of the target base object. Enable this option when you want to use the value from the primary key column from the source system instead of Siperian Hub’s internally-generated key. To enable this option, when adding a staging table to a base object (see “Adding Staging Tables” on page 371), check (select) the Preserve Source System Keys check box in the Add staging to Base Object dialog. Once enabled, during stage jobs, instead of generating an internal key, Siperian Hub takes the value in the PKEY_SOURCE_OBJECT column from the staging table and inserts it into the ROWID_OBJECT column in the target base object. Note: Once a base object is created, you cannot change this setting.

368 Siperian Hub Administrator Guide

Configuring Staging Tables

Specifying the Highest Reserved Key
If the Preserve Source System Keys check box is enabled, then the Schema Manager displays the Highest Reserved Key field. If you want to insert a gap between the source key and Siperian Hub’s key, then enter the amount by which the key is increased after the first load. Note: Set the Highest Reserved Key to the upper boundary of the source system keys. To allow a margin, set this number slightly higher, adding a buffer to the expected range of source system keys. Any records added to the base object that do not contain this key will be given a key by Siperian Hub that is above the highest reserved value you set. Enabling this option has the following consequences when the base object is first loaded: 1. From the staging table, Siperian Hub takes the value in PKEY_SOURCE_ OBJECT and inserts that into the base object’s ROWID_OBJECT—instead of generating Siperian Hub’s internal key.
2.

Siperian Hub then resets the key's starting position to MAX (PKEY_SOURCE_ OBJECT) + the GAP value. On the next load for this staging table, Siperian Hub continues to use the PKEY_ SOURCE_OBJECT. For loads from other staging tables, it uses the Siperian Hub-generated key.

3.

Note: Only one staging table per base object can have this option enabled (even if it is from the same system). The reserved key range is set at the initial load only.

Enabling Cell Update
By default, during the stage process (see “Stage Jobs” on page 745), for each inbound record of data, Siperian Hub replaces the cell value in the target base object whenever an incoming record has a higher trust level—even if the value it replaces is identical. Even though the value has not changed, Siperian Hub updates the last update date for the cell to the date associated with the incoming record, and assigns to the cell the same trust level as a new value. For more information, see “Configuring Trust for Source Systems” on page 455.

Configuring the Stage Process

369

Configuring Staging Tables

You can change this behavior by checking (selecting) the Cell Update check box when configuring a staging table. If cell update is enabled, then during Stage jobs, Siperian Hub will compare the cell value with the current contents of the cross-reference table before it updates the target record in the base object. If the cross-reference record for this system has an identical value in this cell, then Siperian Hub will not update the cell in the Hub Store. Enabling cell update can increase performance during Stage jobs if your Siperian Hub implementation does not require updates to the last update date and trust value in the target base object record.

Properties for Columns in Staging Tables
Columns in staging tables have the following properties:
Property Column Lookup System Lookup Table Lookup Column Description Name of this column as defined in the associated base object or dependent object. Name of the lookup system if the Lookup Table is a cross-reference table. For foreign key columns in the staging table, the name of the table containing the lookup column. For foreign key columns in the staging table, the name of the lookup column in the lookup table. For more information, see “Configuring Lookups For Foreign Key Columns” on page 376. Determines whether null updates are allowed when a Load job specifies a null value for a cell that already contains a non-null value. • • Check (select) this check box to have the Load job update the cell. Do this if you want Siperian Hub to update the cell value even though the new value would be null. Uncheck (clear, the default) this check box to prevent null updates and retain the existing non-null value.

Allow Null Update

370 Siperian Hub Administrator Guide

Configuring Staging Tables

Property Allow Null Foreign Key

Description Determines whether null foreign keys are allowed. Use this option only if null values are valid for the foreign key relationship—that is, if the foreign key is an optional relationship. • • Check (select) this check box to allow data to be loaded when you do not have a value for lookup. Uncheck (clear, the default) this check box to prevent null foreign keys. In this case, records with null values in the lookup column will be written to the rejects table instead of being loaded.

Adding Staging Tables
To add a staging table: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the Base Objects node. In the schema tree, expand the node for the base object associated with this staging table. Do one of the following to identify the base object or dependent object that this staging table will populate. • If you want to add a staging table to this base object, right-click the Staging Tables node and choose Add Staging Table.

3. 4.

5.

Configuring the Stage Process

371

Configuring Staging Tables

•

If you want to add a staging table to a dependent object, expand the Dependent Objects node under the base object, expand the dependent object node, right-click the Staging Tables node and choose Add Staging Table.

Specify the staging table properties. For more information, see “Staging Table Properties” on page 367. Note: Some of these settings cannot be changed after the staging table has been added, so make sure that you specify the settings you want before closing this dialog.

372 Siperian Hub Administrator Guide

Configuring Staging Tables

7.

From the list of the columns in the base object or dependent object, select all of the columns that this source system will provide. For more information, see “Staging Table Columns” on page 365.

Check (select) the columns to include in this staging table

• •

Click the Select All button to select all of the columns without needing to click each column individually. Click the Clear All button to unselect all selected columns.

These staging table columns inherit the properties of their corresponding columns in the base object or dependent object. You can select columns but you cannot change its inherited data types and column widths. Schema Manager creates the new staging table in the Operational Record Store (ORS), along with any support tables, and then adds the new staging table to the schema tree. Note: The Rowid Object and the Last Update Date are automatically selected. You cannot uncheck these columns or change their properties.
8.

Specify column properties. For more information, see “Properties for Columns in Staging Tables” on page 370. For each column that has an associated foreign key relationship, select the row and click the button to define the lookup column. For more information, see “Configuring Lookups For Foreign Key Columns” on page 376. Note: You will not be able to save this new staging table unless you complete this step.

9.

10.

Click OK.

Configuring the Stage Process

373

Configuring Staging Tables

The Schema Manager creates the new staging table in the Operational Record Store (ORS), along with any support tables, and then adds the new staging table to the schema tree.
11.

If you want, configure an Audit Trail and Delta Detection for this staging table. To learn more, see “Using Audit Trail and Delta Detection” on page 398.

Changing Properties in Staging Tables
To change properties in a staging table: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the Base Objects node, and then expand the node for the base object associated with this staging table. • • If the staging table is associated with the base object, then expand the Staging Tables node to display it. If the staging table is associated with a dependent object, expand the Dependent Objects node under the base object, then expand the Staging Tables node to display it.

3.

4.

Select the staging table that you want to configure.

374 Siperian Hub Administrator Guide

Configuring Staging Tables

The Schema Manager displays the properties for the selected table.

5.

Specify the staging table properties. For more information, see “Staging Table Properties” on page 367. For each property that you want to edit (Display Name and Description), click the Edit button next to it, and specify the new value.

6.

From the list of the columns in the base object or dependent object, change the columns that this source system will provide. • • Click the Select All button to select all of the columns without needing to click each column individually. Click the Clear All button to unselect all selected columns.

Note: The Rowid Object and the Last Update Date are automatically selected. You cannot uncheck these columns or change their properties.
7.

If you want, change column properties. For more information, see “Properties for Columns in Staging Tables” on page 370. If you want, change lookups for foreign key columns. Select the column and click the button to configure the lookup column. For more information, see “Configuring Lookups For Foreign Key Columns” on page 376. If you want to change cell updating (see “Enabling Cell Update” on page 369), click in the Cell update check box.

8.

9.

Configuring the Stage Process

375

Configuring Staging Tables

10.

Change the column configuration for your staging table, if you want. For more information, see “Configuring Columns in Tables” on page 125. If you want, configure an Audit Trail and Delta Detection for this staging table. To learn more, see “Using Audit Trail and Delta Detection” on page 398. Click the button to save your changes.

11.

12.

Jumping to the Source System for a Staging Table
To view the source system associated with a staging table: • Right-click the staging table and choose Jump to Source System. The Hub Console launches the Systems and Trust tool and displays the source system associated with this staging table. For more information, see “Configuring Source Systems” on page 348.

About Lookups
A lookup is the process of retrieving a data value from a parent table during Load jobs. In Siperian Hub, when configuring a staging table associated with a base object, if a foreign key column in the staging table (as the child table) is related to the primary key in a parent table, you can configure a lookup to retrieve data from that parent table. The target column in the lookup table must be a unique column (such as the primary key). For more information, see “Performing Lookups Needed to Maintain Referential Integrity” on page 312. For example, suppose your Siperian Hub implementation had two base objects: a Consumer parent base object and an Address child base object, with the following relationship between them:
Consumer.Rowid_object = Address.Consumer_Fkey

376 Siperian Hub Administrator Guide

Configuring Staging Tables

In this case, the Consumer_Fkey will be included in the Address Staging table and it will look up data on some column. Note: The Address.Consumer_Fkey must be the same as Consumer.Rowed_object. In this example, you could configure three types of lookups: • to the ROWID_OBJECT (primary key) of the Consumer base object (lookup table) • to the PKEY_SRC_OBJECT column (primary key) of the cross-reference table for the Consumer base object In this case, you must also define the lookup system. Configuring a lookup to the PKEY_SRC_OBJECT column of a cross-reference table allows you to point to parent tables associated with a source system that differs from the source system associated with this staging table. • to any other unique column, if available, in the base object or its cross-reference table

Once defined, when the Load job runs on the base object, Siperian Hub looks up the source system’s Consumer code value in the primary key from source system column of the Consumer code cross-reference table, and returns the customer type ROWID_ OBJECT value that corresponds to the source consumer type.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the Base Objects node, and then expand the node for the base object associated with this staging table. Select the staging table that you want to configure. Select the row of the foreign key column that you want to configure.

3.

4. 5.

Configuring the Stage Process

377

Configuring Staging Tables

The

Edit Lookup button is enabled only for foreign key columns.

Foreign Key Column
6. 7.

Edit Lookup Button

Click the

Edit Lookup button.

The Schema Manager displays the Define Lookup dialog.

The Define Lookup dialog contains the parent base object and its cross-reference table, along with any unique columns (only).
8.

Select the target column for the lookup.

378 Siperian Hub Administrator Guide

Configuring Staging Tables

•

To define the lookup to a base object, expand the base object and select Rowid_Object (the primary key for this base object).

• •

To define the lookup to a cross-reference table, select PKey Src Object (the primary key for the source system in this cross-reference table). To define the lookup to any other unique column, simply select the column.

Note: When you delete a relationship, it clears the lookup.
9.

If the lookup column is PKey Src Object in the relationship table, select the lookup system from the Lookup System drop-down list. Click OK. If you want, configure the Allow Null Update check box to specify what will happen if a Load job specifies a null value for a cell that already contains a non-null value. For more information, see “Properties for Columns in Staging Tables” on page 370. For each column, configure the Allow Null Foreign Key option to specify what happens if the foreign key column contains a null value (no lookup value is available). For more information, see “Properties for Columns in Staging Tables” on page 370. Click the button to save your changes.

10. 11.

12.

13.

Configuring the Stage Process

379

Mapping Columns Between Landing and Staging Tables

Removing Staging Tables
To remove a staging table: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the schema tree, expand the Base Objects node, and then expand the node for the base object associated with this staging table. Right-click the staging table that you want to remove, and then choose Remove. The Schema Manager prompts you to confirm deletion. Choose Yes. The Schema Manager drops the staging table from the Operational Record Store (ORS), deletes associated control tables, and removes the deleted staging table from the schema tree.

3.

4.

5.

Mapping Columns Between Landing and Staging Tables
This section describes how to configure the mapping between landing and staging tables. Mapping defines how the data is transferred from landing to staging tables via Stage jobs.

About Mapping Columns
To give Siperian Hub the ability to move data from a landing table to a staging table, you need to define a mapping from columns in the landing table to columns in the staging table. This mapping defines: • which landing table column is used to populate a column in the staging table • what standardization and verification (cleansing) must be done, if any, before the staging table is populated

380 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Mappings are configured as either SECURE or PRIVATE resources. For more information, see “Securing Siperian Hub Resources” on page 841.

Relationships Between Landing and Staging Tables
You can map columns from one landing table to multiple staging tables. However, each staging table is mapped to only one landing table.

Data is Either Cleansed or Passed Through Unchanged
For each column of data in the staging table, the data comes from the landing column in one of two ways:
Copy Method passed through cleansed Description Siperian Hub copies the data as is, without making any changes to it. Data comes directly from a column in the landing table. Siperian Hub standardizes and verifies data using cleanse functions. The output of the cleanse function becomes the input to the target column in the staging table. For more information about cleanse functions, see Chapter 12, “Configuring Data Cleansing.”

In the following figure, data in the Name column is cleansed via a cleanse function, while data from all other columns is passed directly to the corresponding target column in the staging table.

Note: A staging table does not need to use every column in the landing table or every output string from a cleanse function. The same landing table can provide input to

Configuring the Stage Process

381

Mapping Columns Between Landing and Staging Tables

multiple staging tables, and the same cleanse function can be reused for multiple columns in multiple landing tables.

Decomposition and Aggregation
Cleanse functions can also decompose and aggregate data. Either way, your mappings need to accommodate the required inputs and outputs. Cleanse Functions that Decompose Data In the following figure, the cleanse function decomposes the name field, breaking the data into smaller pieces.

This cleanse function has one input string and five output strings. In your mapping, you need to make sure that the input string is mapped to the cleanse function, and each output string is mapped to the correct target column in the staging table.

382 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Cleanse Functions that Aggregate Data In the following figure, the cleanse function aggregates data from five fields into a single string.

This cleanse function has five input strings and one output string. In your mapping, you need to make sure that the input strings are mapped to the cleanse function and the output string is mapped to the correct target column in the staging table.

Considerations for Column Mappings
When mapping columns, consider the following rules and guidelines: • The source column must have the same data type as the target column, or it must be a data type that can be implicitly converted to the target column’s data type. • For string (char or varchar) columns, the length does not need to be the same. When data is loaded from the landing table to the staging table, any data value that is too long for the target column will trigger Siperian Hub to place the entire record in a reject table. Although more than three columns from the landing table can be mapped to the Pkey Src Object column in the staging table, index creation is restricted to only three columns.

•

Configuring the Stage Process

383

Mapping Columns Between Landing and Staging Tables

Starting the Mappings Tool
To start the Mappings tool: • In the Hub Console, expand the Model workbench, and then click Mappings. The Hub Console displays the Mappings tool, as shown in the following example.

Mappings List

Properties Pane

The Mappings tool displays the following panels:
Column Mappings List Properties Description List of every defined landing-to-staging mapping. Properties for the selected mapping.

When you select a mapping in the mappings list, its properties are displayed.

384 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Tabs in the Mappings Tool
When a mapping is selected, the Mappings tool displays the following tabs.
Column General Diagram Description General properties for this mapping. For more information, see “Mapping Properties” on page 386. Interactive diagram that lets you define mappings between columns in the landing and staging tables. For more information, see “Mapping Columns Between Landing and Staging Table Columns” on page 389. Allows you to specify query parameters for this mapping. For more information, see “Configuring Query Parameters for Mappings” on page 392. Allows you to test the mapping.

Query Parameters

Test

Mapping Diagrams
When you click the Diagram tab for a mapping, the Mappings tool displays the current column mappings.

Landing Table (Source)

Mapping Lines

Staging Table (Target)

Mapping lines show the mapping from source columns in the landing table to target columns in the staging table. Colors in the circles at either end of the mapping lines indicate data types.

Configuring the Stage Process

385

Mapping Columns Between Landing and Staging Tables

Mapping Properties
Mappings have the following properties.
Field Name Description Landing Table Staging Table Description Name of this mapping as it will be displayed in the Hub Console. Description of this mapping. Select the landing table that will be the source of the mapping. Select the staging table that will be the target of the mapping.

Secure Resource Check (enable) to make this mapping a secure resource, which allows you to control access to this mapping. Once a mapping is designated as a secure resource, you can assign privileges to it in the Secure Resources tool. To learn more, see “Securing Siperian Hub Resources” on page 841, and “Assigning Resource Privileges to Roles” on page 859.

Adding Mappings
To create a new mapping: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click in the area where the mappings are listed and choose Add Mapping.

3.

386 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

The Mappings tool displays the Mapping dialog.

4.

Specify the mapping properties. For more information, see “Mapping Properties” on page 386. Click OK. The Mappings tool displays the landing table and staging table on the workspace. Using the workspace tools and the input and output nodes, connect the column in the landing table to the corresponding column in the staging table. Tip: If you want to automatically map columns in the landing table to columns with the same name in the staging table, click the button.

5.

6.

7. 8.

Click OK. When you are finished, click the button to save your changes.

Copying Mappings
To create a new mapping by copying an existing one: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Configuring the Stage Process

387

Mapping Columns Between Landing and Staging Tables

3.

Right-click the mapping that you want to copy, and then choose Copy Mapping. The Mappings tool displays the Mapping dialog.

4.

Specify the mapping properties. The landing table is already specified. For more information, see “Mapping Properties” on page 386. Click OK. Click the button to save your changes.

5. 6.

Editing Mapping Properties
To create a new mapping by copying an existing one: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the mapping that you want to edit. Edit the mapping properties, diagram, and mapping settings as needed. Click the button to save your changes.

3. 4. 5.

388 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Mapping Columns Between Landing and Staging Table Columns
You use the Diagrams tab in the Mappings tool to define the mappings between source columns in landing tables and target columns staging tables. How you map depends on whether it is a pass through mapping (directly between columns) or a cleansed mapping (data is processed by a cleanse function). For each mapping: • inputs are columns from the landing table • outputs are the columns in the staging table

The workspace and the methods of creating a mapping are the same as for creating cleanse functions. To learn how to use the workspace to define functions, inputs, and outputs, see “Configuring Graph Functions” on page 424.

Navigate to the Diagrams Tab
To navigate to the Diagrams tab: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the mapping that you want to configure. Click the Diagram tab. The Mappings tool displays the Diagram tab for this mapping.

3. 4.

Mapping Columns Directly
To configure mappings directly between columns in landing and staging tables: 1. Navigate to the Diagrams tab according to the instructions in“Navigate to the Diagrams Tab” on page 389.

Configuring the Stage Process

389

Mapping Columns Between Landing and Staging Tables

2.

Mouse-over the output connector (circle) to the right of the column in the landing table (the circle outline turns red), drag the line to the input connector (circle) to the left of the column in the staging table, and then release the mouse button.

Note: If you want to load by RowID, create a mapping between the primary key in the landing table and the Rowid object in the staging table. For more information, see “Loading by RowID” on page 394.

3.

Click the

button to save your changes.

390 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Mapping Columns Using Cleanse Functions
To cleanse data during Stage jobs, you can include one or more cleanse functions in your mapping. This section provides brief instructions for configuring cleanse functions in mappings. To learn more, see “Using Cleanse Functions” on page 414. To configure mappings between columns in landing and staging tables via cleanse functions: 1. Navigate to the Diagrams tab according to the instructions in“Navigate to the Diagrams Tab” on page 389.
2.

Add the cleanse function(s) that you want to configure by right-clicking anywhere in the workspace and choosing the cleanse function that you want to add. For each input connector on the cleanse function, mouse-over the output connector from the appropriate column in the landing table, drag the line to its corresponding input connector, and release the mouse button. Similarly, for each output connector on the cleanse function, mouse-over the output connector, drag the line to its corresponding column in the staging table, and release the mouse button. In the following example, the Titlecase cleanse function will process data that comes from the Last Name column in the landing table and then populate the Last Name column in the staging table with the cleansed data.

3.

4.

5.

Click the

button to save your changes.

Configuring the Stage Process

391

Mapping Columns Between Landing and Staging Tables

Configuring Query Parameters for Mappings
To configure query parameters for a mapping: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the mapping that you want to configure. Click the Query Parameters tab. The Mappings tool displays the Query Parameters tab for this mapping.

3. 4.

5.

If you want, check or uncheck the Enable Distinct check box, as appropriate, to configure distinct mapping. For more information, see “Distinct Mapping” on page 393. If you want, check or uncheck the Enable Condition check box, as appropriate, to configure conditional mapping. For more information, see “Conditional Mapping” on page 394.

6.

392 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

If enabled, type the SQL WHERE clause (omitting the WHERE keyword), and then click Validate to validate the clause.
7.

Click the

button to save your changes.

Filtering Records in Mappings
By default, all records are retrieved from the landing table. Optionally, you can configure a mapping that filters records in the landing table. There are two types of filters: distinct and conditional. You configure these settings on the Query Parameters tab in the Mappings tool. For more information, see “Configuring Query Parameters for Mappings” on page 392. Distinct Mapping If you click the Enable Distinct check box on the Query Parameters tab, the Stage job selects only the distinct records from the landing table. Siperian Hub populates the staging table using the following SELECT statement:
select distinct * from landing_table

Using distinct mapping is useful in situations in which you have a single landing table feeding multiple staging tables and the landing table is denormalized (for example, it contains both customer and address data). A single customer could have three addresses. In this case, using distinct mapping prevents the two extra customer records from being written to the rejects table. In another example, suppose a landing table contained the following data:
LUD 7/24 7/24 CUST_ID 1 1 NAME JOHN JOHN ADDR_ID 1 2 ADDR 1 MAIN ST 1 MAPLE ST

In the mapping to the customer table, check (select) Enable Distinct to avoid having duplicate records because only LUD, CUST_ID, and NAME are mapped to the Customer staging table. With Distinct enabled, only one record would populate your customer table and no rejects would occur.

Configuring the Stage Process

393

Mapping Columns Between Landing and Staging Tables

Alternatively, for the address mapping, you map ADDR_ID and ADDR with Distinct disabled so that you get two records and no rejects. Conditional Mapping If you select the Enable Condition check box, you can apply a SQL WHERE clause to unload the data in cleanse. For example, suppose the data in your landing table is from all states in the US. You can use the WHERE clause to filter the data that is written to the staging tables to include only data from one state, such as California. To do this, type in a WHERE clause (but omit the WHERE keyword): STATE = 'CA'. When the cleanse job is run, it unloads and processes records as SELECT * FROM LANDING WHERE STATE = 'CA'. If you specify conditional mapping, click the Validate button to validate the SQL statement.

Loading by RowID
You can streamline load, match, and merge processing by explicitly configuring Siperian Hub to load by RowID. Otherwise, Siperian Hub loads data according to its default behavior, which is described in “Run-time Execution Flow of the Load Process” on page 304. Note: If you clean the BASE OBJECT using the stored procedure, and if you had setup the TAKE-ON GAP for the particular staging table, the ROWID sequences are reset to 1. In the staging table, the Rowid Object column (a nullable column) has a specialized usage. You can streamline load, match, and merge processing by mapping any column in a landing table to the Rowid Object column in a staging table. In the following example,

394 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

the Address Id column in the landing table is mapped to the Rowid Object column in the staging table.
Rowid Object

Mapping to the Rowid Object column allows for the loading of records by present- or lineage-based ROWID_OBJECT. During the load, if an incoming record with a populated ROWID_OBJECT is new (the incoming PKEY_SRC_OBJECT + ROWID_ SYSTEM is checked), then this record bypasses the match and merge process and gets added to the base object directly—a real-time API PUT(_XREF) by ROWID_ OBJECT. Using this feature enhances lineage and unmerge support, enables closed-loop integration with downstream systems, and can increase throughput. The initial data load for a base object inserts all records into the target base object. Therefore, enable loading by rowID for incremental loads that occur after the initial data load. For more information, see “Initial Data Loads and Incremental Loads” on page 302 and “Run-time Execution Flow of the Load Process” on page 304.

Jumping to a Schema
The Mappings tool allows you to quickly launch the Schema Manager and display the schema associated with the selected mapping. Note: The Jump to Schema command is available only in the Workbenches view, not the Processes view. To jump to the schema for a mapping: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.

Configuring the Stage Process

395

Mapping Columns Between Landing and Staging Tables

2. 3.

Select the mapping whose schema you want to view. In the View By list at the bottom of the navigation pane, choose one of the following options: • • • By Staging Table By Landing Table by Mapping

4.

Right-click anywhere in the navigation pane, and then choose Jump to Schema.

5.

The Mappings tool displays the schema for the selected mapping.

396 Siperian Hub Administrator Guide

Mapping Columns Between Landing and Staging Tables

Testing Mappings
To test a mapping that you have configured: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the mapping that you want to configure. Click the Test tab. The Mappings tool displays the Test tab for this mapping.

3. 4.

Configuring the Stage Process

397

Using Audit Trail and Delta Detection

5. 6. 7.

Specify input values for the columns under Input Name. Click Test. The Mappings tool tests the mapping and populates the columns under Output Name with the results.

Removing Mappings
To remove a mapping: 1. Start the Mappings tool according to the instructions in “Starting the Mappings Tool” on page 384.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click the mapping that you want to remove, and choose Delete Mapping. The Mappings tool prompts you to confirm deletion. Click Yes. The Mappings tool drops supporting tables, removes the mapping from the metadata, and updates the list of mappings.

3.

4.

Using Audit Trail and Delta Detection
After you have completed mapping columns between landing and staging tables, you can configure the audit trail and delta detection features for a staging table. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380.

398 Siperian Hub Administrator Guide

Using Audit Trail and Delta Detection

To configure audit trail and delta detection, click the Settings tab.

Configuring the Audit Trail for a Staging Table
Siperian Hub allows you to configure an audit trail that retains the history of the data in the RAW table based on the number of Loads and timestamps. This audit trail is useful, for example, when using HDD (Hard Delete Detection). By default, audit trails are not enabled, and the RAW table is empty. If enabled, then records are kept in the RAW table for either the configured number of stage job executions or the specified retention period. Note: The Audit Trail has very different functionality from—and is not to be confused with—the Audit Manager tool described in Chapter 22, “Auditing Siperian Hub Services and Events”. To configure the audit trail for a staging table: 1. Start the Schema Manager according to the instructions in“Building the Schema” on page 81.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Configuring the Stage Process

399

Using Audit Trail and Delta Detection

3.

If you have not already done so, add a mapping for the staging table. For more information, see “Adding Mappings” on page 386 Select the staging table that you want to configure. At the bottom of the properties panel, click Preserve an audit trail in the raw table to enable the raw data audit trail. The Schema Manager prompts you to select the retention period for the audit table.

4. 5.

6.

Selecting one of the following options for audit retention period:
Option Loads Time Period Description Number of batch loads for which to retain data. Period of time for which to retain data.

7.

Click

Save to save your changes.

Once configured, the audit trail keeps data for the retention period that you specified. For example, suppose you configured the audit trail for two loads (Stage job executions). In this case, the audit trail will retain data for the two most recent loads to the staging table. If there were ten records in each load in the landing table, then the total number of records in the RAW table would be 20.

400 Siperian Hub Administrator Guide

Using Audit Trail and Delta Detection

If the Stage job is run multiple times, then the data in the RAW table will be retained for the most recent two sets based on the ROWID_JOB. Data for older ROWID_ JOBs will be deleted. For example, suppose the value of the ROWID_JOB for the first Stage job is 1, for the second Stage job is 2, and so on. When you run the Stage job a third time, then the records in which ROWID_JOB=1 will be discarded. Note: Using the Clear History button in the Batch Viewer after the first run of the process: If the audit trail is enabled for a staging table and you choose the Clear History button in the Batch Viewer while the associated stage job is selected, the records in the RAW and REJ tables will be cleared the next time the stage job is run.

Configuring Delta Detection for a Staging Table
If you enable delta detection for a staging table, Siperian Hub processes only new or changed records and ignores unchanged records.

Enabling Delta Detection for a Staging Table
To enable delta detection for a staging table: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2. 3.

Select the staging table that you want to configure. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

Configuring the Stage Process

401

Using Audit Trail and Delta Detection

4.

Select (check) the Enable delta detection check box to enable delta detection for the table. You might need to scroll down to see this option.

5.

Specify the manner in which you want to have deltas detected. You can choose: • • Detect deltas by comparing all columns in mapping Detect deltas via a date column (select the column)

6.

Specify whether to allow staging if a prior duplicate was rejected during the stage process or load process. • Select (check) this option to allow the duplicate record being staged, during this next stage process execution, to bypass delta detection if its previously-staged duplicate was rejected. Note: If this option is enabled, and a user in the Batch Viewer clicks the Clear History button while the associated stage job is selected, then the history of the prior rejection (that this feature relies on) will be discarded because the records in the REJ table will be cleared the next time the stage job is run. • Clear (uncheck) this option (the default) to prevent the duplicate record being staged, during this next stage process execution, from bypassing delta detection if its previously-staged duplicate was rejected. Delta detection will filter out any corresponding duplicate landing record that is subsequently processed in the next stage process execution.

402 Siperian Hub Administrator Guide

Using Audit Trail and Delta Detection

How Siperian Hub Handles Delta Detection
If delta detection is enabled, then the Stage job compares the contents of the landing table—which is mapped to the selected staging table—against the data set processed in the previous run of the stage job. This comparison is done to determine whether the data has changed since the previous run. Changed, new records, and rejected records will be put into the staging table. Duplicate records are ignored. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380. Note: Reject records move from cleanse to load after the second stage run.

Considerations for Using Delta Detection
When using delta detection, consider the following issues: • Delta detection can be done either by comparing entire records or via a date column. Delta detection on last update date is the most efficient, as Siperian Hub can simply compare the last update date columns for each incoming record against the record’s previous last update date. • When processing records by last update date, do not use the Now cleanse function to compare last update values (for example, testing whether the last update date in a source record occurred before the current system date). Using Now in this way can produce unpredictable results. For more information, see Chapter 12, “Configuring Data Cleansing.” Perform delta detection only on columns for those sources where the Last Update Date is not a true indicator of change. The Siperian Hub stage job will compare the entire source record against the most recent corresponding record in the raw table. If any cell is different, then the record is passed on to the staging table. If you update a record and set the Last Update Date to a different date, it will not get delta detected if you have a date in the landing table that is earlier than the date you entered. The new Last Update Date always needs to be earlier than the max date value in the RAW table. During delta detection, when you are checking for deltas on all columns, only records that have null primary keys are rejected. This is expected behavior. Any other records that fail the delta process are rejected on subsequent stage processes.

•

•

•

Configuring the Stage Process

403

Using Audit Trail and Delta Detection

•

When delta detection is based on the Last Update Date, any changes to the last update date or the primary key will be detected. Updates to any values that are not the last update date or part of the concatenated primary key will not be detected. Duplicate primary keys are not considered during subsequent stage processes when using delta detection by mapped columns. Reject handling allows you to: • • • View all reject records for a given staging table regarding of the batch job View all reject records by day across all staging tables Query reject tables based on query filters

• •

404 Siperian Hub Administrator Guide

12
Configuring Data Cleansing

This chapter describes how to configure your Hub Store to cleanse data during the stage process. This chapter is a companion to the material provided in Chapter 11, “Configuring the Stage Process.”

Before You Begin
Before you begin, you must have completed the following tasks: • Installed Siperian Hub and created the Hub Store according to the instructions in the Siperian Hub Installation Guide for your platform. • • • Built the schema according to the instructions in Chapter 5, “Building the Schema.” Created staging tables and landing tables according to the instructions in Chapter 11, “Configuring the Stage Process.” Installed and configured your cleanse engine according to the documentation included in your cleanse engine distribution.

About Data Cleansing in Siperian Hub
Data cleansing is the process of standardizing data to optimize it for input into the match process. Matching cleansed data results in a greater number of reliable matches. This chapter describes internal cleansing—the data cleansing that occurs inside Siperian Hub, specifically during a Stage job, when data is copied from landing tables to the appropriate staging tables (see Chapter 11, “Configuring the Stage Process”). Note: Data cleansing that occurs prior to its arrival in the landing tables is outside the scope of this chapter.

About the Cleanse Match Server
The Cleanse Match Server is a servlet that handles cleanse requests. This servlet is deployed in an application server environment. The servlet contains two server components: • a cleanse server handles data cleansing operations • a match server handles match operations

The Cleanse Match Server is multi-threaded so that each instance can process multiple requests concurrently. It can be deployed on a variety of application servers. See the Siperian Hub Release Notes for a list of supported application servers. See the Siperian Hub Installation Guide for your platform for instructions on installing and configuring Cleanse Match Server(s). Siperian Hub supports running multiple Cleanse Match Servers for each Operational Record Store (ORS). The cleanse process is generally CPU-bound. This scalable architecture allows you to scale your Siperian Hub implementation as the volume of data increases. Deploying Cleanse Match Servers on multiple hosts distributes the processing load across multiple CPUs and permits the running of cleanse operations in parallel. In addition, some external adapters are inherently single-threaded, so this Siperian Hub architecture allows you to simulate multi-threaded operations by running one processing thread per application server instance.

Modes of Cleanse Operations
Cleanse operations can be classified according to the following modes: • Online and Batch (default) • Online Only

Configuring Data Cleansing

407

Configuring Cleanse Match Servers

•

Batch Only

The CLEANSE_TYPE can be used to specify which class(es) of operations a particular Cleanse Match Server will run. If you deploy two Cleanse Match Servers, you could make one batch-only and the other online-only, or you could make them both accept both classes of requests. Unless otherwise specified, a Cleanse Match Server will default to running both kinds of requests.

Distributed Cleanse Match Servers
For your Siperian Hub implementation, you can increase the throughput of the cleanse process by running multiple Cleanse Match Servers in parallel. To learn more about distributed Cleanse Match Servers, see the Siperian Hub Installation Guide.

Cleanse Match Servers and Proxy Users
If proxy users have been configured for your Siperian Hub implementation, if you created proxy_user and cmx_ors with different passwords, then you need to either: • restart the application server and log in to the proxy user from the Hub Console or • register the Cleanse Match Server for the proxy user again

Otherwise, Stage jobs will fail.

Cleanse Requests
All requests for cleansing are issued by database stored procedures. These stored procedures package a cleanse request as an XML payload and transmit it to a Cleanse Match Server. When the Cleanse Match Server receives a request, it parses the XML and invokes the appropriate code:
Mode Type On-line operations Description The result is packaged as an XML response and sent back via an HTTP POST connection.

408 Siperian Hub Administrator Guide

Configuring Cleanse Match Servers

Mode Type Batch jobs

Description The Cleanse Match Server pulls the data to be processed into a flat file, processes it, and then uses a bulk loader to write the data back. • • For Oracle, it uses the Oracle loader (SQLLDR) utility. For DB2, it uses the DB2 Load utility.

The Cleanse Match Server is multi-threaded so that each instance can process multiple requests concurrently. The default timeout for batch requests from Oracle to a Cleanse Match Server is one year, and the default timeout for on-line requests is one minute. For DB2, the default timeout for batch requests or SIF requests is 600 seconds (10 minutes). When running a stage/match job, if more than one cleanse match server is registered, and if the total number of records to be staged or matched is more than 500, then the job will get distributed in parallel among the available Cleanse Match Servers.

Cleanse Match Server Properties
When configuring Cleanse Match Servers, you can specify the following settings.
Property Server Port Cleanse Server Description Host or machine name of the application server on which you deployed Siperian Hub Cleanse Match Server. HTTP port of the application server on which you deployed the Cleanse Match Server. Determines whether to use the Cleanse Match Server for cleansing data. Select (check) this check box to use the Cleanse Match Server for cleansing data. • Clear (uncheck) this check box if you do not want to use the Cleanse Match Server for cleansing data. If an ORS has multiple associated Cleanse Match Servers, you can enhance performance by configuring each Cleanse Match Server as either a match-only or a cleanse-only server. Use this option in conjunction with the Match Server check box to implementation this configuration. Cleanse Mode Match Server Mode that the Cleanse Match Server uses for cleansing data. For details, see “Modes of Cleanse Operations” on page 407. Determines whether to use the Match Server for matching data. Check (select) this check box to use the Match Server for matching data. • Uncheck (clear) this check box if you do not want to use the Match Server for matching data. If an ORS has multiple associated Cleanse Match Servers, you can enhance performance by configuring each Cleanse Match Server as either a match-only or a cleanse-only server. Use this option in conjunction with the Cleanse Server check box to implementation this configuration. Match Mode Mode that the Match Server uses for matching data. One of the following values: For details, see “Cleanse Requests” on page 408. • •

410 Siperian Hub Administrator Guide

Configuring Cleanse Match Servers

Property Offline

Description Determines whether the Cleanse Match Server is offline or online. • Select (check) this check box to take the Cleanse Match Server offline, making it temporarily unavailable. Once offline, no cleanse jobs are sent to that Cleanse Match Server (servlet). • Clear (uncheck) this check box to make an offline Cleanse Match Server available again so that Siperian Hub can once again send cleanse jobs to that Cleanse Match Server. Note: Siperian Hub looks at this field but does not set it. Taking a Cleanse Match Server offline is an administrative action.

Thread Count

Overrides the default thread count. The default, recommended, value is 1 thread. Thread counts are defined in the Sipeiran Hub Console and can be changed without having to restart the server. Note: You must change this value after migration from an earlier hub version or all values will default to 1 thread.

CPU Rating

Specifies a relative CPU performance rating for the host machine on which this Cleanse Match Server runs. This rating is relevant only in relation to CPU ratings for other host machines on which Cleanse Match Servers are also running.

Set the properties for this new Cleanse Match Server. To learn more, see “Cleanse Match Server Properties” on page 410. If proxy users have been configured for your Siperian Hub implementation, see “Cleanse Match Servers and Proxy Users” on page 408.

Change the properties you want for this Cleanse Match Server. To learn more, see “Cleanse Match Server Properties” on page 410. If proxy users have been configured for your Siperian Hub implementation, see “Cleanse Match Servers and Proxy Users” on page 408.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the Cleanse Match Server that you want to delete. Click the button. The Cleanse Match Server tool prompts you to confirm deletion. Click OK to delete the server.

3. 4. 5.

Testing the Cleanse Match Server Configuration
Whenever you add or change your Cleanse Match Server information, it is recommended that you check the configuration to make sure that the connection works properly. To test the Cleanse Match Server configuration: 1. Start the Cleanse Match Server tool. To learn more, see “Starting the Cleanse Match Server Tool” on page 409.
2. 3.

Select the Cleanse Match Server that you want to test. Click the button to test the configuration.

Configuring Data Cleansing

413

Using Cleanse Functions

If the test succeeds, the Cleanse Match Server tool displays a window showing the connection information and a success message.

If there was a problem, Siperian Hub will display a window with information about the connection problem.
4.

Click OK.

Using Cleanse Functions
This section describes how to use cleanse functions to clean data in your Siperian Hub implementation. To learn more, see “About Data Cleansing in Siperian Hub” on page 406.

About Cleanse Functions
In Siperian Hub, you can build and execute cleanse functions that cleanse data. A cleanse function is a function that is applied to a data value in a record to standardize or verify it. For example, if your data has a column for salutation, you could use a cleanse function to standardize all instances of “Doctor” to “Dr.” You can apply cleanse functions successively, or simply assign the output value to a column in the staging table.

Types of Cleanse Functions
In Siperian Hub, each cleanse function is one of the following types: • a Siperian Hub-defined function • • a function defined by your cleanse engine a custom cleanse function you define

414 Siperian Hub Administrator Guide

Using Cleanse Functions

The pre-defined functions provide access to specialized cleansing functionality, such as name and address standardization, address decomposition, gender determination, and so on. To learn more, see “Using Cleanse Functions” on page 414.

Libraries
Functions are organized into libraries—Java libraries and user libraries, which are folders used to organize the functions that you can use in the Cleanse Functions tool in the Model workbench. To learn more, see “Configuring Cleanse Libraries” on page 418.

Cleanse Functions are Secure Resources
Cleanse functions can be configured as secure resources and made SECURE or PRIVATE. To learn more, see “Securing Siperian Hub Resources” on page 841.

Available Functions Subject to Cleanse Engine
The functions you see in the Hub Console depend on the cleanse engine that you are using. Siperian Hub shows the cleanse functions that your cleanse engine makes available. Regardless of which cleanse engine you use, the overall process of data cleansing in Siperian Hub is the same.

Starting the Cleanse Functions Tool
The Cleanse Functions tool provides the interface for defining how you cleanse your data. To start the Cleanse Functions tool: • In the Hub Console, expand the Model workbench and then click Cleanse Functions.

Configuring Data Cleansing

415

Using Cleanse Functions

The Cleanse Functions tool is divided into two panes:
Pane Navigation pane Properties pane Description Shows the cleanse functions in a tree view. Clicking on any node in the tree shows you the appropriate properties page in the right-hand pane. Shows the properties for the selected function. For any of the custom cleanse functions, you can edit properties in the right-hand pane.

The functions you see in the left pane depend on the cleanse engine you are using. Your functions may differ from the ones shown in the previous figure.

Cleanse Function Types
Cleanse functions are grouped in the tree according to their type. Cleanse function types are high-level categories that are used to group similar cleanse functions for easier management and access.

Cleanse Function Properties
If you expand the list of cleanse function types in the navigation pane, you can select a cleanse function to display its particular properties.

416 Siperian Hub Administrator Guide

Using Cleanse Functions

In addition to specific cleanse functions, the Misc Functions include Read Database and Reject functions that provide efficiencies in data management.

Field Read Database

Description Allows a map to lookup records directly from a database table. Note: This function is designed to be used when there are many references to the same limited number of data items.

Reject

Allows the creator of a map to identify incorrect data and reject the record, noting the reason.

Overview of Configuring Cleanse Functions
To define cleanse functions, you complete the following tasks: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2. 3. 4.

Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30. Click Refresh to refresh your cleanse library. Create your own cleanse library, which is simply a folder where you keep your custom cleanse functions. See “Configuring Cleanse Libraries” on page 418. Define regular expression functions in the new library, if applicable. See “Configuring Regular Expression Functions” on page 422. Define graph functions in the new library, if applicable. See “Configuring Graph Functions” on page 424.

Configuring Cleanse Libraries
You can configure either user libraries or Java libraries.

Configuring User Libraries
You can add a User Library when you want to create a customized cleanse function from existing internal or external Siperian cleanse functions. To add a user cleanse library: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click Refresh to refresh your cleanse library. In the tree, select the Cleanse Functions node. Right-click and choose Add User Library from the pop-up menu. The Cleanse Functions tool displays the Add User Library dialog.

3. 4. 5.

418 Siperian Hub Administrator Guide

Using Cleanse Functions

6.

Specify the following properties:
Field Name Description Description Unique, descriptive name for this library. Optional description of this library.

7.

Click OK. The Cleanse Functions tool displays the new library you added in the list under Cleanse libraries in the navigation pane.

Configuring Java Libraries
To add a Java cleanse library: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click Refresh to refresh your cleanse library. In the tree, select the Cleanse Functions node. Right-click and choose Add Java Library from the pop-up menu. The Cleanse Functions tool displays the Add Java Library dialog.

3. 4. 5.

6.

Specify the JAR file for this library. You can click the Browse button to look for the JAR file.

Configuring Data Cleansing

419

Using Cleanse Functions

7.

Specify the following properties:
Field Name Description Description Unique, descriptive name for this library. Optional description of this library.

8.

If applicable, click the Parameters button to specify any parameters for this library. The Cleanse Functions tool displays the parameters dialog.

You can add as many parameters as needed for this library. • To add a parameter, click the the Add Value dialog. button. The Cleanse Functions tool displays

Type a name and value, and then click OK.

420 Siperian Hub Administrator Guide

Using Cleanse Functions

•

To import parameters, click the button. The Cleanse Functions tool displays the Open dialog, prompting you to select a properties file containing the parameter(s) you want.

The name, value pairs that are imported from the file will be available to the user-defined Java function at run time as elements of its Java properties. This allows you to provide customized values in a generic function, such as “userid” or “target URL”.
9.

Click OK. The Cleanse Functions tool displays the new library in the list under Cleanse libraries in the navigation pane.

To learn about adding graph functions to your library, see “Configuring Graph Functions” on page 424.

About Regular Expression Functions
In Siperian Hub, a regular expression function allows you to use regular expressions for cleanse operations. Regular expressions are computational expressions that are used to match and manipulate text data according to commonly-used syntactic conventions and symbolic patterns. To learn more about regular expressions, including syntax and patterns, refer to the Javadoc for java.util.regex.Pattern. Alternatively, to define a graph function instead, see “Configuring Graph Functions” on page 424.

Adding Regular Expression Functions
To add a regular expression function: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click a User Library name and choose Add Regular Expression Function. The Cleanse Functions tool displays the Add Regular Expression dialog.

3.

422 Siperian Hub Administrator Guide

Using Cleanse Functions

4.

Specify the following properties:
Field Name Description Description Unique, descriptive name for this regular expression function. Optional description of this regular expression function.

5.

Click OK. The Cleanse Functions tool displays the new regular expression function under the user library in the list in the left pane, with the properties in the right pane.

Configuring Data Cleansing

423

Using Cleanse Functions

6.

Click the Details tab.

7.

If you want, specify an input or output expression by clicking the icon to edit the field, entering a regular expression, and then clicking the icon to apply the change. Click the icon to save your changes.

About Graph Functions
In Siperian Hub, a graph function is a cleanse function that you can visualize and configure graphically using the Cleanse Functions tool in the Hub Console. You can add any pre-defined functions to a graph function. Alternatively, to define a regular expression function, see “Configuring Regular Expression Functions” on page 422.

424 Siperian Hub Administrator Guide

Using Cleanse Functions

Inputs and Outputs
Graph functions have: • one or more inputs (input parameters) • one or more outputs (output parameters)

For each graph function, you must configure all required inputs and outputs. Inputs and outputs have the following properties.
Field Name Description Data Type Description Unique, descriptive name for this input or output. Optional description of this input or output. Data type. Must match exactly. One of the following values: • • • • • Boolean—accepts Boolean values only Date—accepts date values only Float—accepts float values only Integer—accepts integer values only String—accepts any data

Adding Graph Functions
To add a graph function: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click on a User Library name and choose Add Graph Function.

3.

Configuring Data Cleansing

425

Using Cleanse Functions

The Cleanse Functions tool displays the Add Graph Function dialog.

4.

Specify the following properties:
Field Name Description Description Unique, descriptive name for this graph function. Optional description of this graph function.

5.

Click OK. The Cleanse Functions tool displays the new graph function under the library in the list in the left pane, with the properties in the right pane:

426 Siperian Hub Administrator Guide

Using Cleanse Functions

This graph function is empty. To configure it and add functions, see “Adding Functions to a Graph Function” on page 427.

Adding Functions to a Graph Function
You can add as many functions as you want to a graph function. The example in this section shows adding only a single function. If you already have graph functions defined, you can treat them just like any other function in the cleanse libraries. This means that you can add a graph function inside another graph function. This approach allows you to reuse functions. To add functions to a graph function: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click your graph function, and then click the Details tab to see the function represented in graphical format.

3.

Toolbar

Workspace

Configuring Data Cleansing

427

Using Cleanse Functions

The area in this tab is referred to as the workspace. You might need to resize the window to see both the input and output on the workspace.

By default, graph functions have one input and one output that are of type string (gray circle). The function that you are defining might require more inputs and/or outputs and different data types. To learn more, see “Configuring Inputs” on page 434 and “Configuring Outputs” on page 435.
4.

Right-click on the workspace and choose Add Function from the pop-up menu. For more on the other commands on this pop-up menu, see “Workspace Commands” on page 432. You can also add or delete these functions using the toolbar buttons. The Cleanse Functions tool displays the Choose Function to Add dialog.

428 Siperian Hub Administrator Guide

Using Cleanse Functions

5.

Expand the folder containing the function you want to add, select the function to add, and then click OK. Note: The functions that are available for you to add depend on your cleanse engine and its configuration. Therefore, the functions that you see might differ from the cleanse functions shown in the previous figure. The Cleanse Functions tool displays the added function in your workspace.

Note: Although this example shows a single graph function on the workspace, you can add multiple functions to a cleanse function. To move a function, click it and drag it wherever you need it on the workspace.

6.

Right-click on the function and choose Expanded Mode.

Configuring Data Cleansing

429

Using Cleanse Functions

The expanded mode shows the labels for all available inputs and outputs for this function.

For more on the modes, see “Function Modes” on page 432. The color of the circle indicates the data type of the input or output. The data types must match. In the following example, for the Round function, the input is a Float value and the output is an Integer. Therefore, the Inputs and Outputs have been changed to reflect the corresponding data types.

To learn more, see “Configuring Inputs” on page 434 and “Configuring Outputs” on page 435.

430 Siperian Hub Administrator Guide

Using Cleanse Functions

7.

Mouse-over the input connector, which is the little circle on the right side of the input box. It turns red when ready for use.

8.

Click the node and draw a line to one of the function input nodes.

9.

Draw a line from one of the function output nodes to the output box node.

10.

Click the button to save your changes. To learn about testing your new function, see “Testing Functions” on page 437.

Configuring Data Cleansing

431

Using Cleanse Functions

Workspace Commands
There are several ways to complete common tasks on the workspace. • One way is to use the buttons on the toolbar. To learn more about these buttons, see “Workspace Buttons” on page 433. • Another method to access many of the same features is to right-click on the workspace. The right-click menu has the following commands:

Function Modes
Function modes determine how the function is displayed on the workspace. Each function has the following modes, which are accessible by right-clicking the function:
Option Compact Standard Expanded Logging Enabled Description Displays the function as a small box, with just the function name. Displays the function as a larger box, with the name and the nodes for the input and output, but the nodes are not labeled. This is the default mode. Displays the function as a large box, with the name, the input and output nodes, and the names of those nodes. Used for debugging. Choosing this option generates a log file for this function when you run a Stage job (see “Stage Jobs” on page 745). The log file records the input and output for every time the function is called during the stage job. There is a new log file created for each stage job. The log file is named <jobID><graph function name>.log and is stored in: \Siperian\hub\cleanse\tmp\<ORS> Note: Do not use this option in production, as it will consume disk space and require performance overhead associated with the disk I/O. To disable this logging, right-click on the function and uncheck Enable Logging. Delete Object Deletes the function from the graph function.

You can cycle through the display modes (compact, standard, and expanded) by double-clicking on the function.

432 Siperian Hub Administrator Guide

Using Cleanse Functions

Workspace Buttons
The toolbar on the right side of the workspace provides the following buttons.
Button Description Save changes. Edit the function inputs. Edit the function outputs. Add a function. To learn more, see “Adding Functions to a Graph Function” on page 427. Add a constant. To learn more, see “Using Constants” on page 433. Add a conditional execution component. To learn more, see “Using Conditions in Cleanse Functions” on page 438. Edit the selected component. Delete the selected component. Expand the graph. This makes more room for the workspace on the screen by hiding the left pane.

Using Constants
Constants are useful in cases where you know that you have standardized input. For example, if you have a data set that you know consists entirely of doctors, then you can use a constant to put Dr. in the title. When you use constants in your graph function, they are differentiated visually from other functions by their grey background color.

Configuring Data Cleansing

433

Using Cleanse Functions

Configuring Inputs
To add more inputs: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the cleanse function that you want to configure. Click the Details tab. Right-click on the input and choose Edit inputs. The Cleanse Functions tool displays the Inputs dialog.

3. 4. 5.

Note: Once you create an input, you cannot later edit the input to change its type. If you must change the type of an input, create a new one of the correct type and delete the old one.
6.

Click the

button to add another input.

434 Siperian Hub Administrator Guide

Using Cleanse Functions

The Cleanse Functions tool displays the Add Parameter dialog.

7.

Specify the following properties:
Field Name Data Type Description Description Unique, descriptive name for this parameter. Data type of this parameter. Optional description of this parameter.

8.

Click OK. Add as many inputs as you need for your functions.

Configuring Outputs
To add more outputs: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the cleanse function that you want to configure. Click the Details tab. Right-click on the output and choose Edit outputs.

3. 4. 5.

Configuring Data Cleansing

435

Using Cleanse Functions

The Cleanse Functions tool displays the Outputs dialog.

Note: Once you create an output, you cannot later edit the output to change its type. If you must change the type of an output, create a new one of the correct type and delete the old one.
6.

Click the

button to add another output.

The Cleanse Functions tool displays the Add Parameter dialog.

Field Name Data Type Description

Description Unique, descriptive name for this parameter. Data type of this parameter. Optional description of this parameter.

436 Siperian Hub Administrator Guide

Using Cleanse Functions

7.

Click OK. Add as many outputs as you need for your functions.

Testing Functions
Once you have added and configured a graph or regular expression function, it is recommended that you test it to make sure it is behaving as expected. This test process mimics a single record coming into the function. To test your function: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the cleanse function that you want to test. Click the Test tab. The Cleanse Functions tool displays the test screen.

3. 4.

5.

For each input, specify the value that you want to test by clicking the cell in the Value column and typing a value that complies with the data type of the input. • For Boolean inputs, the Cleanse Functions tool displays a true/false drop-down list.

Configuring Data Cleansing

437

Using Cleanse Functions

•

For Calendar inputs, the Cleanse Functions tool displays a Calendar button that you can click to select a date from the Date dialog.

6.

Click Test. If the test completed successfully, the output is displayed in the output section.

Using Conditions in Cleanse Functions
This section describes how to add conditions to graph functions.

About Conditional Execution Components
Conditional execution components are similar to the construct of a case (or switch) statement in a programming language. The cleanse function evaluates the condition and, based on this evaluation, applies the appropriate graph function associated with the case that matches the condition. If no case matches the condition, then the default case is used—the case flagged with an asterisk (*).

When to Use Conditional Execution Components
Conditional execution components are useful when, for example, you have segmented data. Suppose a table has several distinct groups of data (such as customers and prospects). You could create a column that indicated the group of which the record is a member. Each group is called a segment. In this example, customers might have C in this column. while prospects would have P. You could use a conditional execution component to cleanse the data differently for each segment. If the conditional value does not meet any of the conditions you specify, then the default case will be executed.

438 Siperian Hub Administrator Guide

Using Cleanse Functions

Adding Conditional Execution Components
To add a conditional execution component: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the cleanse function that you want to configure. Right-click on the workspace and choose Add Condition. The Cleanse Functions tool displays the Edit Condition dialog.

3. 4.

5.

Click the

button to add a value.

The Cleanse Functions tool displays the Add Value dialog.

Configuring Data Cleansing

439

Configuring Cleanse Lists

6.

Enter a value for the condition. Using the customer and prospect example, you would enter C or P. Click OK. The Cleanse Functions tool displays the new condition in the list of conditions on the left, as well as in the input box. Add as many conditions as you require. You do need to specify a default condition—the default case is automatically created when you create a new conditional execution component. However, you can specify the default case with the asterisk (*). The default case will be executed for all cases that are not covered by the cases you specify.

7.

Add as many functions as you require to process all of the conditions. To learn more, see “Adding Functions to a Graph Function” on page 427. For each condition—including the default condition—draw a link between the input node to the input of the function. In addition, draw links between the outputs of the functions and the output of your cleanse function.

8.

Note: You can specify nested processing logic in graph functions. For example, you can nest conditional components within other conditional components (such as nested case statements). In fact, you can define an entire complex process containing many conditional tests, each one of which contains any level of complexity as well.

About Cleanse Lists
A cleanse list is a logical grouping of string functions that are executed at run time in a predefined order.

440 Siperian Hub Administrator Guide

Configuring Cleanse Lists

Adding Cleanse Lists
To add a new cleanse list: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click Refresh to refresh your cleanse library. Used with external cleanse engines. Important: You must choose Refresh after acquiring a write lock and before processing any records. Otherwise, your external cleanse engine will throw an error.

Specify the following properties:
Field Name Description Description Unique, descriptive name for this cleanse list. Optional description of this cleanse list.

6.

Click OK.

Configuring Data Cleansing

441

Configuring Cleanse Lists

The Cleanse Functions tool displays the details pane for the new (empty) cleanse list on the right side of the screen.

Editing Cleanse List Properties
New cleanse lists are empty lists. You need to edit the cleanse list to add match and output strings. To edit your cleanse list to add match and output strings: 1. Start the Cleanse Functions tool according to the instructions in “Starting the Cleanse Functions Tool” on page 415.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the cleanse list that you want to configure.

3.

442 Siperian Hub Administrator Guide

Configuring Cleanse Lists

The Cleanse Functions tool displays information about the cleanse list in the right pane.

4.

Change the display name and description in the right pane, if you want, by clicking the Edit button next to a value that you want to change. Click the Details tab.

5.

Configuring Data Cleansing

443

Configuring Cleanse Lists

The Cleanse Functions tool displays the details for the cleanse list.

6.

Click the

button in the right hand pane.

The Cleanse Functions tool displays the Output String dialog.

7.

Specify a search string, an output string, a match type, and click OK. The search string is the input that you want to cleanse, resulting in the output string. Important: Siperian Hub will search through the strings in the order in which they are entered. The order in which you specify the items can therefore affect the results obtained. To learn more about the types of matches available, see “Types of String Matches” on page 445. Note: As soon as you add strings to a cleanse list, the cleanse list is saved. The strings that you specified are shown in the Cleanse List Details section.

444 Siperian Hub Administrator Guide

Configuring Cleanse Lists

8.

You can add and remove strings. You can also move string forward or backward in the cleanse list, which affects their order in run-time execution sequence and, therefore, the results obtained. You can also specify the “Default value” for every input string that does not match any of the search strings. If you do not specify a default value, every input string that does not match a search string is passed to the output string with no changes.

9.

Types of String Matches
For the output string, you can specify one of the following match types:
Match Type Exact Match Regular Expression Description Text string (for example, “IBM”). Pattern using the syntax for regular expressions (for example, “I.M.*” would match “IBM”, “IB Corp” and “IXM Inc.”) To parse a name field that consists of first, middle, and last names, you could use the following regular expression (\S+$) will give you the last name no matter what name you give it. The regular expression that is typed in as a parameter will be used against the string and the matched output will be sent to the outlet. You can also specify the group number to match an inner group of the regular expression. Refer to the Javadoc for java.util.regex.Pattern for the documentation on the regular expression construction and how groups work. SQL Match Pattern using the syntax for the LIKE operator in SQL (for example, “I_M%” would match “IBM”, “IBM Corp” and “IXM Inc.”)

Importing Match Strings
To import match strings (such as a file or a database table): 1. Click the button in the right hand pane.

Configuring Data Cleansing

445

Configuring Cleanse Lists

The Import Match Strings wizard opens.

2.

Specify the connection properties for the source of the data and click Next. The Cleanse Functions tool displays a list of tables available for import.

3.

Select the table you want to import and click Next.

446 Siperian Hub Administrator Guide

Configuring Cleanse Lists

The Cleanse Functions tool displays a list of columns available for import.

4.

Click the columns you want to import and click Next. The Cleanse Functions tool displays a list of match strings available for import.

You can import the records of the sample data either as phrases (one entry for each record) or as words (one entry for each word in each record). Choose whether to import the match strings as words or phrases and then click Finish.

Configuring Data Cleansing

447

Configuring Cleanse Lists

The Cleanse List Details box is now populated with data from the specified source.

Note: The imported match strings are not part of the match list. To add them to the match list, you need to move them to the Search Strings on the right hand side. • To add match strings to the match list with the match string value in both the Search String and Output String, select the strings in the Match Strings list, and click the • button. If you add match strings to the match list with an Output String value that you want to define, simply click the record you added and specify a new Search and Output String. To add all Match Strings to the match list, click the To clear all Match Strings from the match list, click the button. button.

• • •

Repeat these steps until you have constructed a complete match list.

448 Siperian Hub Administrator Guide

Configuring Cleanse Lists

5.

When you have finished changing the match list properties, click the to save your changes.

button

Importing Match Output Strings
To import match output strings, such as a file or a database table: 1. Click the button in the right hand pane. The Import Match Output Strings wizard opens.

2. 3.

Specify the connection properties for the source of the data. Click Next.

Configuring Data Cleansing

449

Configuring Cleanse Lists

The Cleanse Functions tool displays a list of tables available for import.

4. 5.

Select the table that you want to import. Click Next. The Cleanse Functions tool displays a list of columns available for import.

6. 7.

Select the columns that you want to import. Click Next.

450 Siperian Hub Administrator Guide

Configuring Cleanse Lists

The Cleanse Functions tool displays a list of match strings available for import.

8.

Click Finish. The Cleanse List Details box is now populated with data from the specified source. When you have finished changing the match list properties, click the to save your changes. button

9.

Configuring Data Cleansing

451

Configuring Cleanse Lists

452 Siperian Hub Administrator Guide

13
Configuring the Load Process

This chapter explains how to configure the load process in your Siperian Hub implementation. For an introduction, see “Load Process” on page 299.

Before You Begin
Before you begin to configure the load process, you must have completed the following tasks: • Installed Siperian Hub and created the Hub Store according to the instructions in the Siperian Hub Installation Guide for your platform • • • • • Built the schema according to the instructions in Chapter 5, “Building the Schema” Defined source systems according to the instructions in “Configuring Source Systems” on page 348 Created landing tables according to the instructions in “Configuring Landing Tables” on page 355 Created staging tables according to the instructions in “Configuring Staging Tables” on page 364 Learned about the load process described in “Load Process” on page 299

Configuration Tasks for Loading Data
In addition to the prerequisites described in “Before You Begin” on page 454, to set up the process of loading data in your Siperian Hub implementation, you must complete the following tasks in the Hub Console: • “Configuring Trust for Source Systems” on page 455 • “Configuring Validation Rules” on page 468

Configuring Trust for Source Systems
This section describes how to configure trust in your Siperian Hub implementation. For an introduction, see “Trust Settings” on page 303.

About Trust
Several source systems may contain attributes that correspond to the same column in a base object table. For example, several systems may store a customer’s address. However, one system might be a more reliable source for that data than others. If these systems disagree, then Siperian Hub must decide which value is the best one to use. To help with comparing the relative reliability of column data from different source systems, Siperian Hub allows you to configure trust for a column. Trust is a designation the confidence in the relative accuracy of a particular piece of data. For each column from each source, you can define a trust level represented by a number between 0 and 100, with zero being the least trustworthy and 100 being the most trustworthy. By itself, this number has no meaning. It becomes meaningful only when compared with another trust number to determine which is higher. Trust takes into account the age of data, how much its reliability has decayed over time, and the validity of the data. Trust is used to determine survivorship (when two records are consolidated), and whether updates from a source system are sufficiently reliable to update the master record.

Trust Levels
A trust level is a number between 0 and 100. By itself, this number has no meaning. It has meaning only when compared with another trust number.

Data Reliability Decays Over Time
The reliability of data from a given source system can decay (diminish) over time. In order to reflect this fact in trust calculations, Siperian Hub allows you to configure decay characteristics for trust-enabled columns. The decay period is the amount of time that it takes for the trust level to decay from the maximum trust level (see “Maximum

Configuring the Load Process

455

Configuring Trust for Source Systems

Trust” on page 459) to the minimum trust level (see “Minimum Trust” on page 459). For more information, see “Units” on page 459, “Decay” on page 459, and “Graph Type” on page 460.

Trust Calculations
The load process calculates trust for trust-enabled columns in the base object. For records with trust-enabled columns, the load process assigns a trust score to cell data. This trust score is initially based on the configured trust settings for that column. The trust score may be subsequently downgraded when the load process applies validation rules—if configured for a trust-enabled column—after the trust calculations. For more information, see “Run-time Execution Flow of the Load Process” on page 304. Trust Calculations for Load Update Operations During the load process, if a record in the staging table will be used for a load update operation, and if that record contains a changed cell value in a trust-enabled column, the load process calculates trust scores for: • the cell data in the source record in the staging table (which contains the updated information) • the cell data in the target record in the base object (which contains the existing information)

If the cell data in the source record has a higher trust score than the cell data in the target record, then Siperian Hub updates the cell in the base object record with the cell data in the staging table record. Trust Calculations When Consolidating Two Base Object Records When two records in a base object are consolidated, Siperian Hub calculates the trust score for each trusted column in the two records being merged. Cells with the highest trust scores survive in the final consolidated record. If the trust scores are the same, then Siperian Hub compares records according to an order of precedence, as described in “Survivorship and Order of Precedence” on page 291.

456 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

Control Tables for Trust-Enabled Columns
The following figure shows control tables associated with trust-enabled columns in a base object.

p

For each trust-enabled column in a base object record, Siperian Hub maintains a record in a corresponding control table that contains the last update date and an identifier of the source system. Based on these settings, Siperian Hub can always calculate the current trust for the column value. If history is enabled for a base object, Siperian Hub also maintains a separate history table for the control table, in addition to history tables for the base object and its cross-reference table.

Configuring the Load Process

457

Configuring Trust for Source Systems

Cell Values in Base Object Records and Cross-Reference Records The cross-reference table for a base object contains the most recent value from each source system. By default (without trust settings), the base object contains the most recent value no matter which source system it comes from. For trust-enabled columns, the cell value in a base object record might not have the same value as its corresponding record in the cross-reference table. Validation rules, which are run during the load process after trust calculations, can downgrade trust for a cell so that a source that had previously provided the cell value might not update the cell. For more information about validation rules, see “Configuring Validation Rules” on page 468.

Overriding Trust Scores
Data stewards can manually override a calculated trust setting if they have direct knowledge that a particular value is correct. Data stewards can also enter a value directly into a record in a base object. For more information, see the Siperian Hub Data Steward Guide.

Trust for State-Enabled Base Objects
For state-enabled base objects, trust is calculated for records with a PENDING or ACTIVE state, but records with a DELETE state are ignored. For more information, see Chapter 7, “State Management.”

Batch Job Constraints on Number of Trust-Enabled Columns
Synchronize batch jobs can fail for base objects with a large number of trust-enabled columns. Similarly, Automerge jobs can fail if there is a large number of trust-enabled or validation-enabled columns. The exact number of columns that cause the job to fail is variable and is based on the length of the column names and the number of trust-enabled columns (or, for Automerge jobs, validation-enabled columns as well). Long column names are at—or close to—the maximum allowable length of 26 characters. To avoid this problem, keep the number of trust-enabled columns below 60 and/or the length of the column names short. A work around is to enable all

458 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

trust/validation columns before saving the base object to avoid running the synchronization job.

Trust Properties
This section describes the trust properties that you can configure for trust-enabled columns. Trust properties are configured separately for each source system that could provide records for trust-enabled columns in a base object.

Maximum Trust
The maximum trust (starting trust) is the trust level that a data value will have if it has just been changed. For example, if source system X changes a phone number field from 555-1234 to 555-4321, the new value will be given system X’s maximum trust level for the phone number field. By setting the maximum trust level relatively high, you can ensure that changes in the source systems will usually be applied to the base object.

Minimum Trust
The minimum trust is the trust level that a data value will have when it is old (after the decay period has elapsed). This value must be less than or equal to the maximum trust. Note: If the maximum and minimum trust are equal, then the decay curve is a flat line and the decay period and decay type have no effect.

Units
Specifies the units used in calculating the decay period—day, week, month, quarter, or year.

Decay
Specifies the number (of days, weeks, months, quarters, or years) used in calculating the decay period.

Configuring the Load Process

459

Configuring Trust for Source Systems

Note: For the best graph view, limit the decay period you specify to between 1 and 100.

Graph Type
Decay follows a pattern in which the trust level decreases during the decay period. The graph types show these decay patterns have any of the following settings.
Icon Graph Type Description Linear Simplest decay. Decay follows a straight line from the maximum trust to the minimum trust. Most of the decrease occurs toward the beginning of the decay period. Decay follows a concave curve. If a source system has this graph type, then a new value from the system will probably be trusted, but this value will soon become much more likely to be overridden. Most of the decrease occurs toward the end of the decay period. Decay follows a convex curve. If a source system has this graph type, it will be relatively unlikely for any other system to override the value that it sets until the value is near the end of its decay period.

Rapid Initial Slow Later (RISL) Slow Initial Rapid Later (SIRL)

Test Offset Date

By default, the start date for trust decay shown in the Trust Decay Graph is the current system date. To see the impact of trust decay based on a different start date for a given source system, specify a different test offset date according to the instructions in “Changing the Offset Date for a Trust-Enabled Column” on page 466.

Considerations for Setting Trust Values
Choosing the correct trust values can be a complex process. It is not enough to consider one system in isolation. You must ensure that the combinations of trust settings for all of the source systems that contribute to a particular column produce the behavior that you want. Trust levels for a source system are not absolute—they are

460 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

meaningful only in relation to the trust levels of other source systems that contribute data for the trust-enabled column. When determining trust, consider the following questions. • Does the source system validate this data value? How reliably does it do this? • How important is this data value to the users of the source system, as compared with other data values? Users are likely to put the most effort into validating the data that is central to their work. How frequently is the source system updated? How frequently is a particular attribute likely to be updated?

• •

Enabling Trust for a Column
Trust is enabled and configured on a per-column basis for base objects in the Schema Manager. Trust does not apply to columns in dependent objects or any other tables in an ORS. For more information, see “Configuring Columns in Tables” on page 125.
Select to enable trust

Trust is disabled by default. When trust is disabled, Siperian Hub uses the value from the most recently-executed load process regardless of which source system it comes from. If column data for a base object comes from only one system, then trust should remain disabled for that column. Trust should be enabled, however, for columns in which data can come from multiple source systems. If you enable trust for a column, you also assign trust levels to specify the relative reliability of any source systems that could provide records that update the column.

Configuring the Load Process

461

Configuring Trust for Source Systems

Assigning Trust Levels to Trust-Enabled Columns
This section describes how to configure trust levels for trust-enabled columns. Assigning Trust Levels to the Admin Source System

Before You Configure Trust for Trust-Enabled Columns
Before you configure trust for trust-enabled columns, you must have: • enabled trust for base object columns according to the instructions in “Enabling Trust for a Column” on page 461 • configured staging tables in the Schema Manager, including associated source systems and staging table columns that correspond to base object columns, according to the instructions in “Configuring Staging Tables” on page 364

Specifying Trust for the Administration Source System
At a minimum, you must specify trust settings for trust-enabled columns in the administration source system (called Admin by default). This source system represents manual updates that you make within Siperian Hub. This source system can contribute data to any trust-enabled column. Set the trust settings for this source system to high values (relative to other source systems) to ensure that manual updates override any existing values from other source systems. For more information, see “Administration Source System” on page 349.

Assigning Trust Levels to Trust-Enabled Columns in a Base Object
To assign trust levels to trust-enabled columns in a base object:

462 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

1.

Start the Systems and Trust tool according to the instructions in “Starting the Systems and Trust Tool” on page 350.

Navigation Pane
2.

Properties Pane

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, expand the Trust node. The Systems and Trust tool displays all base objects with trust-enabled columns.

3.

4.

Select a base object.

Configuring the Load Process

463

Configuring Trust for Source Systems

The Systems and Trust tool displays a read-only view of the trust-enabled columns in the selected base object, indicating with a check mark whether a given source system supplies data for that column.
Trust-Enabled Columns Source Systems

Note: The association between trust-enabled columns and source systems is specified in the staging tables for this base object. For more information, see “Configuring Staging Tables” on page 364.
5.

Expand a base object to see its trust-enabled columns.

6.

Select the trust-enabled column that you want to configure.

464 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

For the selected trust-enabled column, the Systems and Trust tool displays the list of source systems associated with the column, along with editable trust settings to be configured per source system, and a trust decay graph.
Source Systems Trust Settings Trust Decay Graph

7.

Specify the trust properties for each column. For more information, see “Trust Properties” on page 459. Optionally, you can change the offset date, as described as “Changing the Offset Date for a Trust-Enabled Column” on page 466. Click the button to save your changes.

8.

9.

Configuring the Load Process

465

Configuring Trust for Source Systems

The Systems and Trust tool refreshes the Trust Decay Graph based on the trust settings you specified for each source system for this trust-enabled column.

The X-axis is the trust score and the Y-axis is the time.

Changing the Offset Date for a Trust-Enabled Column
By default, the Trust Decay Graph shows the trust decay across all source systems from the current system date. You can specify a different date (such as a future date) to test your current trust settings and see how trust would decay from that date. Note that offset dates are not saved. To change the offset date for a trust-enabled column: 1. In the Systems and Trust tool, select a trust-enabled column according to the instructions in “Assigning Trust Levels to Trust-Enabled Columns in a Base Object” on page 462.
2.

Click the Calendar button next to the source system for which you want to specify a different offset date.

466 Siperian Hub Administrator Guide

Configuring Trust for Source Systems

The Systems and Trust tool prompts you to specify a date.

3. 4.

Select a different date. Choose OK. The Systems and Trust tool updates the Trust Decay Graph based on your current trust settings and the Offset Date you specified.

To remove the Offset Date: • Click the Delete button next to the source system for which you want to remove the Offset Date. The Systems and Trust tool updates the Trust Decay Graph based on your current trust settings and the current system date.

Running Synchronize Batch Jobs After Changes to Trust Settings
After records have been loaded into a base object, if you enable trust for any column, or if you change trust settings for any trust-enabled column(s) in that base object, then you must run the Synchronize batch job (see “Synchronize Jobs” on page 747) before running the consolidation process. If this batch job is not run, then errors will occur during the consolidation process.

Configuring the Load Process

467

Configuring Validation Rules

Configuring Validation Rules
This section describes how to configure validation rules in your Siperian Hub implementation. For an introduction, see “Validation Rules” on page 304.

About Validation Rules
A validation rule downgrades trust for a cell value when the cell value matches a given condition. Each validation rule specifies: • a condition that determines whether the cell value is valid • an action to take if the condition is met (downgrade trust by a certain percentage)

For example, the following validation rule:
Downgrade trust on First_Name by 50% if Length < 3’

If the Reserve Minimum Trust flag is set for the column, then the trust cannot be downgraded below the column’s minimum trust. You use the Schema Manager to configure validation rules for a base object. Validation rules are executed during the load process, after trust has been calculated for trust-enabled columns in the base object. If validation rules have been defined, then the load process applies them to determine the final trust scores, and then uses the final trust values to determine whether to update records in the base object with cell data from the updated records. For more information, see “Run-time Execution Flow of the Load Process” on page 304.

Validation Checks
A validation check can be done on any column in a base object. The downgrade resulting from the validation check can be applied to the same column, as well as to any other

468 Siperian Hub Administrator Guide

Configuring Validation Rules

columns that can be validated. Invalid data in one column can therefore result in trust downgrades on many columns. For example, supposed you used an address verification flag in which the flag is OK if the address is complete and BAD if the address is not complete. You could configure a validation rule that downgrades the trust on all address fields if the verification flag is not OK. Note that, in this case, the verification flag should also be downgraded.

Required Columns
Validation rules are applied regardless of the source of the incoming data. However, validation rules are applied only if the staging table or if the input—a Services Integration Framework (SIF) request—contains all of the required columns. If any required columns are missing, validation rules are not applied.

Recalculating Trust Scores After Changing Validation Rules
If a base object contains existing data and you change validation rules, you must run the Revalidate job to recalculate trust scores for new and existing data, as described in “Revalidate Jobs” on page 745.

Validation Rules and State-Enabled Base Objects
For state-enabled base objects, validation rules are applied to records with a PENDING or ACTIVE state, but records with a DELETE state are ignored. For more information, see Chapter 7, “State Management.”

Automerge Job Constraints on Number of Validation Columns
Automerge jobs can fail if there is a large number of validation-enabled columns. The exact number of columns that cause the job to fail is variable and is based on the length of the column names and the number of validation-enabled columns. Long column names are at—or close to—the maximum allowable length of 26 characters. To avoid this problem, keep the number of validation-enabled columns below 60 and/or the length of the column names short. A work around is to enable all

Configuring the Load Process

469

Configuring Validation Rules

trust/validation columns before saving the base object to avoid running the synchronization job.

Enabling Validation Rules for a Column
A validation rule is enabled and configured on a per-column basis for base objects in the Schema Manager. Validation rules do not apply to columns in dependent objects or any other tables in an ORS. For more information, see “Configuring Columns in Tables” on page 125.
Select to enable validation rules

Validation rules are disabled by default. Validation rules should be enabled, however, for any trust-enabled columns that will use validation rules for trust downgrades.

How the Downgrade Percentage is Applied
Validation rules downgrade trust scores according to the following algorithm:
Final trust = Trust - (Trust * Validation_Downgrade / 100)

Execution Sequence of Validation Rules
Validation rules are executed in sequence. If multiple validation rules are configured for a column, only one validation rule—the rule with the greatest downgrade percentage—is applied to the column. Downgrade percentages are not cumulative—rather, the “winning” validation rule overwrites any previous-applied changes. Therefore, when configuring multiple validation rules for a column, specify an execution order of increasing downgrade percentage, starting with the validation rule that has the lowest impact (downgrade percentage) first, and ending with the validation rule that has the highest impact (downgrade percentage) last. Note: The execution sequence for validation rules differs between the load process described in this chapter and PUT requests invoked by external applications using the Services Integration Framework (SIF). For PUT requests, validation rules are executed in order of decreasing downgrade percentage. For more information, see the Siperian Services Integration Framework Guide and the Siperian Hub Javadoc.

Navigating to the Validation Rules Node
To configure validation rules, you navigate to the Validation Rules node for a base object in the Schema Manager: 1. Start the Schema Manager according to the instructions in “Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the tree for the base object that you want to configure, and then click its Validation Rules Setup node.

3.

Configuring the Load Process

471

Configuring Validation Rules

The Schema Manager displays the Validation Rules editor.

List of Validation Rules

Properties Pane

The Validation Rules editor is divided into the following sections.
Pane Number of Rules Validation Rules Properties Pane Description Number of configured validation rules for the selected base object. List of configured validation rules for the selected base object. Properties for the selected validation rule. For more information, see “Validation Rule Properties” on page 473.

472 Siperian Hub Administrator Guide

Configuring Validation Rules

Validation Rule Properties
Validation rules have the following properties.

Rule Name
A unique, descriptive name for this validation rule.

Rule Type
The type of validation rule. One of the following values.
Rule Type Existence Check Domain Check Referential Integrity Description Trust will be downgraded if the cell has a null value (the cell value does not exist). Trust will be downgraded if the cell value does not fall within a list or range of allowed values. Trust will be downgraded if the value in a cell does not exist in the set of values in a column on a different table. This rule is for use in cases where an explicit foreign key has not been defined, and an incorrect cell value can be allowed if there is no correct cell value that has higher trust. Trust will be downgraded if the value in a cell conforms (LIKE) or does not conform (NOT LIKE) to the specified pattern. Used for entering complex validation rules. This rule type should only be used when SQL functions (such as LENGTH, ABS, etc.) might be required, or if a complex join is required. Note: Custom SQL code must conform with the SQL syntax for your database platform. SQL entered in this pane is not validated at design time. Invalid SQL syntax errors cause problems when the load process executes.

Pattern Validation Custom

Configuring the Load Process

473

Configuring Validation Rules

Rule Columns
For each column, you specify the downgrade percentage and whether to reserve minimum trust. Downgrade Percentage Percentage by which the trust level of the specified column will be decreased if this validation rule condition is met. The larger the percentage, the greater the downgrade. For example, 0% has no effect on the trust, while 100% downgrades the trust completely (unless the reserve minimum trust is specified, in which case 100% downgrades the trust so that it equals minimum trust). If trust is downgraded by 100% and you have not enabled minimum reserve trust for the column, then the value of that column will not be populated into the base object. Reserve Minimum Trust Specifies what will happen if the downgrade causes the trust level to fall below the column’s minimum trust level. You can retain the minimum trust (so that the trust level will be reduced to the minimum trust but no lower). If this box is cleared (unchecked), then the trust level will be reduced by the specified percentage even if this means going below the minimum trust.

Rule SQL
Specifies the SQL WHERE clause representing the condition for this validation rule. During the load process, the validation rule is executed. If data meets the criteria specified in the Rule SQL field, then the trust value is downgraded by the downgrade percentage configured for this validation rule.

474 Siperian Hub Administrator Guide

Configuring Validation Rules

SQL WHERE Clause Based on the Rule Type The Validation Rules editor prompts you to configure the SQL WHERE clause based on the selected Rule Type for this validation rule.

Expression

List of Table Columns

During the load process, this query is used to check the validity of the data in the staging table.

Configuring the Load Process

475

Configuring Validation Rules

Example SQL WHERE Clauses The following table provides examples of SQL WHERE clauses based on the selected rule type.
Examples of WHERE Clause for Each Rule Type

Rule Type Existence Check

WHERE clause
WHERE S.ColumnName IS NULL

Examples
WHERE S.MIDDLE_ NAME IS NULL

Result Affected columns will be downgraded for records with middle names that are null. The records that do not meet the condition will not be affected. Affected columns will be downgraded if the Gender is any value other than M, F, or U. Affected columns will be downgraded for records with Account Type values that are not on the Account Type table.

Domain Check

WHERE S.ColumnName IN ('?', '?', '?')

WHERE S.Gender NOT IN ('M', 'F', 'U')

Referential Integrity

WHERE NOT EXISTS (SELECT <blank>’a’ FROM ? WHERE ?.? = S.<Column_Name> WHERE NOT EXISTS (SELECT <blank> 'a' FROM <Ref_ Table> WHERE <Ref_Table>.<Ref_ Column> = S.<Column_Name>

Downgrade will be applied if the e-mail address does not contain an @ character. Downgrade will be applied if the length of the zip code column is less than 4.

476 Siperian Hub Administrator Guide

Configuring Validation Rules

Table Aliases and Wildcards
You can use the wildcard character (*) to reference tables via an alias. • s.* aliases the staging table • I.* aliases a temporary table and provides ROWID_OBJECT, PKEY_SRC_ OBJECT, and ROWID_SYSTEM information for the records being updated.

Custom Rule Types and SQL WHERE Syntax For Custom rule types, write SQL statements that are well formed and well tuned. If you need more information about SQL WHERE clause syntax and wild card patterns, refer to the product documentation for the database platform used in your Siperian Hub implementation. Note: Be sure to specify precedence correctly using parentheses according to the SQL syntax for your database platform. Incorrect or omitted parentheses can have unexpected results and long-running queries. For example, the following statement is ambiguous and leaves it up to the database server to determine precedence:
WHERE conditionA OR conditionB or conditionC

The following statements use parentheses to explicitly specify precedence:
WHERE (conditionA AND conditionB) OR conditionC WHERE conditionA AND (conditionB OR conditionC)

These two statements will yield very different results when evaluating records.

Configuring the Load Process

477

Configuring Validation Rules

Adding Validation Rules
To add a validation rule: 1. Navigate to the Validation Rules editor. For more information, see “Navigating to the Validation Rules Node” on page 471.
2.

Click the

button.

The Schema Manager displays the Add Validation Rule dialog.

3.

Specify the properties for this validation rule. For more information, see “Validation Rule Properties” on page 473. If you want, select the rule column(s) for this validation rule by clicking the button.

4.

478 Siperian Hub Administrator Guide

Configuring Validation Rules

The Validation Rules editor displays the Select Rule Columns dialog.

The available columns are those that have the Validate flag enabled (see “Column Properties” on page 127. For more information, see “Configuring Columns in Tables” on page 125. Select the column(s) for which the trust level will be downgraded if the condition specified in the WHERE clause for this validation rule is met, and then click OK.
5.

Click OK. The Schema Manager adds the new rule to the list of validation rules. Note: If a base object contains existing data and you change validation rules, you must run the Revalidate job to recalculate trust scores for new and existing data, as described in “Revalidate Jobs” on page 745.

Configuring the Load Process

479

Configuring Validation Rules

Editing Validation Rule Properties
To edit a validation rule: 1. Navigate to the Validation Rules editor in the Schema Manager. For more information, see “Navigating to the Validation Rules Node” on page 471.
2.

In the Navigation Rules list, select the navigation rule that you want to configure. The Validation Rules editor displays the properties for the selected validation rule.

3.

Specify the editable properties for this validation rule. You cannot change the rule type. For more information, see “Validation Rule Properties” on page 473. If you want, select the rule column(s) for this validation rule by clicking the button.

4.

480 Siperian Hub Administrator Guide

Configuring Validation Rules

The Validation Rules editor displays the Select Rule Columns dialog.

The available columns are those that have the Validate flag enabled (see “Column Properties” on page 127. For more information, see “Configuring Columns in Tables” on page 125. Select the column(s) for which the trust level will be downgraded if the condition specified in the WHERE clause for this validation rule is met, and then click OK.
5.

Click the

button to save changes.

Note: If a base object contains existing data and you change validation rules, you must run the Revalidate job to recalculate trust scores for new and existing data, as described in “Revalidate Jobs” on page 745.

Changing the Sequence of Validation Rules
The execution order for validation rules is extremely important. For more information, see “Execution Sequence of Validation Rules” on page 471. Use the following buttons to change the sequence of validation rules in the list.
Click To.... Move the selected validation rule higher in the sequence. Move the selected validation rule further down in the sequence.

Configuring the Load Process

481

Configuring Validation Rules

Removing Validation Rules
To remove a validation rule: 1. Navigate to the Validation Rules editor in the Schema Manager. For more information, see “Navigating to the Validation Rules Node” on page 471.
2. 3.

In the Validation Rules list, select the validation rule that you want to remove. Click the Click Yes. Note: If a base object contains existing data and you change validation rules, you must run the Revalidate job to recalculate trust scores for new and existing data, as described in “Revalidate Jobs” on page 745. button. The Schema Manager prompts you to confirm deletion.

4.

482 Siperian Hub Administrator Guide

14
Configuring the Match Process

This chapter describes how to configure your Hub Store to identify and handle potential duplicate records. For an introduction to the match process, see “Match Process” on page 317.

Before You Begin
Before you begin, you must have installed Siperian Hub, created the Hub Store according to the instructions in Siperian Hub Installation Guide, and built the schema according to the instructions in Chapter 5, “Building the Schema.”

Configuration Tasks for the Match Process
This section provides an overview of the configuration tasks associated with the match process. For an introduction to the match process, see “Match Process” on page 317.

Understanding Your Data
Before you define match rules, you must be very familiar with your data and understand: • the distribution of the values in the columns you intend to use to determine duplicate records, and • the general proportion of the total number of records that are duplicates.

Base Object Properties Associated with the Match Process
The following base object properties affect the behavior of the match process.
Property Duplicate Match Threshold Description Used only with the Match for Duplicate Data job for initial data loads. For more information, see “Duplicate Match Threshold” on page 103.

Max Elapsed Match Timeout (in minutes) when executing a match rule. If exceeded, the Minutes match process exits. For more information, see “Max Elapsed Match Minutes” on page 103. Match Flag audit table If enabled, then an audit table (BusinessObjectName_FMHA) is created and populated with the userID of the user who, in Merge Manager, queued a manual match record for automerging. For more information, see “Match Flag Audit Table” on page 105 and the Siperian Hub Data Steward Guide.

484 Siperian Hub Administrator Guide

Configuration Tasks for the Match Process

Configuration Steps for Defining Match Rules
To define match rules: 1. Configure the match properties for the base object. For more information, see “Setting Match Properties” on page 488.
2.

Define your match columns. For more information, see “Match Columns Depend on the Search Strategy” on page 515. Define a match rule set for your match rules. For more information, see “Adding Match Rule Sets” on page 538. Define your match rules for the rule set. For more information, see “Adding Match Column Rules” on page 565. Repeat steps 3 and 4 until you are finished creating match rules. Based on your knowledge of your data, determine whether you require matching based on primary keys. For more information, see “Configuring Primary Key Match Rules” on page 578. If your data is appropriate for primary key matching, create your primary key match rules. For more information, see “Adding Primary Key Match Rules” on page 578. Tune your rules. This is an iterative process by which you apply your match rules to a representative data set, analyze the results, and adjust your settings to optimize the match performance.

3.

4.

5. 6.

7.

8.

Configuring Base Objects with International Data
Siperian Hub supports matching for base objects that contain data from non-United States populations, as well as base objects that contain data from different populations (for example, the United States and China). For more information, see “Configuring Match Settings for Non-US Populations” on page 941.

Configuring the Match Process 485

Navigating to the Match/Merge Setup Details Dialog

Navigating to the Match/Merge Setup Details Dialog
To set up the match and merge process for a base object, begin by completing the following steps: 1. Start the Schema Manager. For more information, see “Starting the Schema Manager” on page 90.
2.

In the schema navigation tree, expand the base object for which you want to define match properties. In the schema navigation tree, select Match/Merge Setup. The Schema Manager displays the Match/Merge Setup Details dialog, as shown in the following example.

3.

If you want to change settings, you need to Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30.

486 Siperian Hub Administrator Guide

Navigating to the Match/Merge Setup Details Dialog

The Match/Merge Setup Details dialog contains the following tabs:
Tab Name Properties Description Summarizes the match/merge setup and provides various configurable match/merge settings. For more information, see “Configuring Match Properties for a Base Object” on page 488. Allows you to configure the match path for parent/child relationships for records in different base objects or in the same base object. For more information, see “Configuring Match Paths for Related Records” on page 497. Allows you to configure match columns for match column rules. To learn more, see “Configuring Match Columns” on page 515 and “Configuring Match Column Rules for Match Rule Sets” on page 542. Allows you to define a search strategy and rules using match rule sets. For more information, see “Configuring Match Rule Sets” on page 531.

Paths

Match Columns

Match Rule Sets

Primary Key Match Allows you to define primary key match rules. For more information, Rules see “Configuring Primary Key Match Rules” on page 578. Match Key Distribution Merge Settings Shows the distribution of match keys. For more information, see “Investigating the Distribution of Match Keys” on page 583. Allows you to merge and link settings. For more information, see Chapter 15, “Configuring the Consolidate Process.”

Configuring the Match Process 487

Configuring Match Properties for a Base Object

Configuring Match Properties for a Base Object
You must set the match properties for a base object before you can configure other match features, such as match columns and match rules. These match properties apply to all rules for the base object.

Setting Match Properties
You configure match properties for each base object. These settings apply to all of its match rules and rule sets. To configure match properties for a base object: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure according to the instructions in “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Description Number of match columns configured for this base object. Read-only. Number of match rule sets configured for this base object. Read-only. Number of match rules configured for this base object in the rule set currently selected as active. Read-only. Number of primary key match rules configured for this base object. Read-only.

Maximum Matches for Manual Consolidation
This setting helps prevent data stewards from being overwhelmed with thousands of matches for manual consolidation. This sets the limit on the list of possible matches that must be decided upon by a data steward (default is 1000). Once this limit is reached, Siperian Hub stops the match process until the number of records for manual consolidation has been reduced. This value is calculated by checking the count of records with a consolidation_ind=2. At the end of each automatch and merge cycle, this count is checked and, if the count exceeds the maximum number of matches for manual consolidation, then the automatch-and-merge process will exit.

490 Siperian Hub Administrator Guide

Configuring Match Properties for a Base Object

Number of Rows per Match Job Batch Cycle
This setting specifies an upper limit on the number of records that Siperian Hub will process for matching during match process execution (Match or Auto Match and Merge jobs). When the match process starts executing, it begins by flagging records to be included in the match job batch. From the pool of new/unconsolidated records that are ready for match (CONSOLIDATION_IND=4, as described in “Consolidation Indicator” on page 289), the match process changes CONSOLIDATION_IND to 3. The number of records flagged is determined by the Number of Rows per Match Job Batch Cycle. The match process then matches those records in the match job batch against all of the records in the base object. The number of records in the match job batch affects how long the match process takes to execute. The value to specify depends on the size of your data set, the complexity of your match rules, and the length of the time window you have available to run the match process. The default match batch size is low (10). You increase this based on the number of records in the base object, as well as the number of matches generated for those records based on its match rules. • The lower your match batch size, the more times you will need to run the match and consolidation processes. • The higher your match batch size, the more work each match and consolidation process does.

For each base object, there is a medium ground where you reach the optimal match batch size. You need to identify this optimal batch size as part of performance tuning in your environment. Start with a match batch size of 10% of the volume of records to be matched and merged, run the match job only, see how many matches are generated by your match rules, and then adjust upwards or downwards accordingly.

Configuring the Match Process 491

Configuring Match Properties for a Base Object

Accept All Unmatched Rows as Unique
Enable (set to Yes) this feature to have Siperian Hub mark as unique (CONSOLIDATION_IND=1) any records that have been through the match process, but for which no matches were identified. If enabled, for such records, Siperian Hub automatically changes their state to consolidated (changes the consolidation indicator from 2 to 1). Consolidated records are removed from the data steward’s queue via the Automerge batch job. By default, this option is disabled. In a development environment, you might want this option disabled, for example, while iteratively testing and tuning match rules to determine which records are found to be unique for a given set of match rules. This option should always be enabled in a production environment. Otherwise, you can end up with a large number of records with a consolidation indicator of 2. If this backlog of records exceeds the Maximum Matches for Manual Consolidation setting (see “Maximum Matches for Manual Consolidation” on page 490), then you will need to process these records first before you can continue matching and consolidating other records. For more information, see: • “Initial Data Loads and Incremental Loads” on page 302 • • • • “Consolidation Indicator” on page 289 “Accept Non-Matched Records As Unique” on page 715 “Automerge Jobs” on page 717 “Autolink Jobs” on page 715

492 Siperian Hub Administrator Guide

Configuring Match Properties for a Base Object

Match/Search Strategy
Select the match/search strategy to specify the reliability of the match versus the performance you require. Select one of the following options.
Strategy Option Fuzzy Description Probabilistic match that takes into account spelling variations, possible misspellings, and other differences that can make matching records non-identical. This is the primary means of matching data in a base object. Referred to in this document as fuzzy-match base objects. Note: If you specify a Fuzzy match/search strategy, you must specify a fuzzy match key. Exact Matches only records with identical values in the match column(s). If you specify an exact match, you can define only exact-match columns for this base object (exact-match base objects cannot have fuzzy-match columns). Referred to in this document as exact-match base objects.

An exact strategy is faster, but an exact match will miss some matches if the data is imperfect. The best option to choose depends on the characteristics of the data, your knowledge of the data, and your particular match and consolidation requirements. Certain configuration settings the Match / Merge Setup tab apply to only one type of base object. In this document, such features are indicated with a graphic that shows whether it applies to fuzzy-match base objects only (as in the following example), or exact-match base objects only. No graphic means that the feature applies to both.

Note: The match / search strategy is configured at the base object level. For more information about the match / search strategy configured at the match rule level, see “Match / Search Strategy” on page 544.

Configuring the Match Process 493

Configuring Match Properties for a Base Object

Fuzzy Population

If the match/search strategy is Fuzzy, then you must select a population, which defines certain characteristics about the records that you are matching. Data characteristics can vary from country to country. By default, Siperian Hub comes with the US population, but Siperian provides standard populations per country. If you require another population, contact Siperian support. If you chose an exact match/search strategy, then this value is ignored. Populations perform the following functions for matching: • accounts for the inevitable variations and errors that are likely to exist in name, address, and other identification data For example, the population for the US has some intelligence about the typical identification numbers used in US data, such as the social security number. Populations also have some intelligence about the distribution of common names. For example, the US population has a relatively high percentage of the surname Smith. But a population for a non-English-speaking country would not have Smith among the common names. • • specifies how Siperian Hub builds match tokens, which are described in “Match Keys and the Tokenization Process” on page 322 specifies how search strategies and match purposes operate on the population of data to be matched

Match Only Previous Rowid Objects
If this setting is enabled (checked), then Siperian Hub matches the current records against records with lower ROWID_OBJECT values. For example, if the current record has a ROWID_OBJECT value of 100, then the record will be matched only against other records in the base object with a ROWID_OBJECT value that is less than 100 (ignoring all records with a ROWID_OBJECT value that is higher than 100). Using this feature can reduce the number of matches required and speed performance. However, if PUTs are executed, or if records are inserted out of rowid order, then

494 Siperian Hub Administrator Guide

Configuring Match Properties for a Base Object

records might not be fully matched. You must assess the trade-off between performance and match quantity based on the characteristics of your data and your particular match requirements. By default, this option is disabled (unchecked).

Match Only Once

Available only for fuzzy key matching and only if “Match Only Previous Rowid Objects” is checked (selected). If Match Only Once is enabled (checked), then once a record has found a match, Siperian Hub will not match it any further within this search range (the set of similar match key values). Using this feature can reduce duplicates and increase performance. Instead of finding every match for a record in a search range, Siperian Hub can find a single match for each. In subsequent match cycles, the merge process will put these into large groups of XREF records associated with the base object. By default, this option is unchecked (disabled). If this feature is enabled, however, you can miss matches. For example, suppose record A matches record B, and record A matches record C, but record B and C do not match. You must assess the trade-off between performance and match quantity based on the characteristics of your data and your particular match requirements.

Dynamic Match Analysis Threshold
During the match process, dynamic match analysis determines whether the match process will take an unacceptably long period of time. This threshold value specifies the maximum acceptable number of comparisons. To enable the dynamic match threshold, specify a non-zero value. Enable this feature if you have data that is very similar (with high concentrations of matches) to reduce the amount of work expended for a hot spot in your data. A hotspot is a group of records representing overmatched data—a large intersection of matches. If Dynamic Match Analysis Threshold is enabled, then records that produce more than the specified number of potential match candidates will be skipped during the match process. By default, this option is zero (disabled).

Configuring the Match Process 495

Configuring Match Properties for a Base Object

Before conducting a match on a given search range, Siperian Hub calculates the number of search records (records being searched for matches), and multiplies it by the number of file records (the number of records returned from the match key table that need to be compared). If the result is greater than the specified Dynamic Match Analysis Threshold, then no comparisons are performed on that range of data, and the range is noted in the application server log for further investigation.

Enable Match on Pending Records
By default, the match process includes only ACTIVE records and ignores PENDING records. For state management-enabled objects, select this check box to include PENDING records in the match process. Note that, regardless of this setting, DELETED records are ignored by the match process. For more information, see “Enabling Match on Pending Records” on page 214.

Reset Link Properties for Link-style Base Objects
For link-style base objects only, you can unlink consolidated records and requeue them for match. This can be configured to occur automatically on load update, or manually by via the Reset Links batch job. For more information, see “Reset Links Jobs” on page 744. For link-style base objects only, the Schema Manager displays the following properties.
Property Description

Description Specifies whether manually-linked records are included by the reset links process. Autolinked records are always included. Note: This setting affects the scope of all other reset links settings.

Supporting Long ROWID_OBJECT Values
If a base object has such a large number of records that the ROWID_OBJECT values might exceed 12 digits or more, you need to explicitly enable support for longer values in the Cleanse Match Server. To enable the Cleanse Match Server to use long Rowid Object values, edit the cmxcleanse.properties file and configure the cmx.server.bmg.use_longs setting:
cmx.server.bmg.use_longs=1

By default, this option is disabled.

Configuring Match Paths for Related Records
This section describes how to configure match paths for related records, which are used for matching in your Siperian Hub implementation.

Match Paths
A match path allows you to traverse the hierarchy between records—whether that hierarchy exists between base objects (inter-table paths) or within a single base object (intra-table paths). Match paths are used for configuring match column rules involving related records in either separate tables or in the same table.

Configuring the Match Process 497

Configuring Match Paths for Related Records

Foreign Key Relationships and Filters
Configuring match paths that point to other records involves two main components:
Component foreign key relationships filters (optional) Description Used to traverse the relationships to other records. Allows you to specify parent-to-child and child-to-parent relationships. Allow you to selectively include or exclude records based on values in a given column, such as ADDRESS_TYPE or PARTY_TYPE. For more information, see “Configuring Filters for Match Paths” on page 511.

Relationship Base Objects
In order to configure match rules for these kinds of relationships, particularly many-to-many relationships, you need create a separate base object that serves as a relationship base object to describe to Siperian Hub the relationships between records. You populate this relationship base object with information about the relationships using a data management tool (outside of Siperian Hub) rather than using the Siperian Hub processes (land, stage, and load, as described in Chapter 9, “Siperian Hub Processes.”). You configure a separate relationship base object for each type of relationship. You can include additional attributes of the relationship type, such as start date, end date, and other relationship details. The relationship base object defines a match path that enables you to configure match column rules. Important: Do not run the match and consolidation processes on a base object that is used to define relationships between records in inter-table or intra-table match paths. Doing so will change the relationship data, resulting in the loss of the associations between records.

Inter-Table Paths
An inter-table path defines the relationship between records in two different base objects. In many cases, this relationship can be defined simply by configuring a foreign key relationship: a key column in the child base object points to the primary key of the

498 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

parent base object. For more information, see “Configuring Foreign-Key Relationships Between Base Objects” on page 140. In some cases, however, the relationship between records can be more complex, requiring an intermediary base object that defines the relationship between records in the two tables. Example Base Objects for Inter-Table Paths Consider the following example in which a Siperian Hub implementation has two base objects:
Base Object Person Description Contains any type of person, such as employees for your organization, employees for some other organizations (prospects, customers, vendors, or partners), contractors, and so on. Contains any type of address—mailing, shipping, home, work, and so on.

Address

In this example, there is the potential for many-to-many relationships: • A person could have multiple addresses, such as a home and work address. • A single address could have multiple persons, such as a workplace or home.

In order to configure match rules for this kind of relationship between records in different base objects, you would create a separate base object (such as PersAddrRel) that describes to Siperian Hub the relationships between records in the two base objects. Columns in the Example Base Objects Suppose the Person base object had the following columns:
Column Type Description Primary key. Uniquely identifies this person in the base object. Type of person, such as an employee or customer contact.

ROWID_OBJECT CHAR(14) TYPE CHAR(14)

Configuring the Match Process 499

Configuring Match Paths for Related Records

Column NAME EMPLOYER ...

Type VARCHAR(50) VARCHAR(50) ...

Description Person’s name (simplified for this example). Person’s employer. ...

Suppose the Address base object had the following columns:
Column TYPE NAME ADDRESS_1 ADDRESS_2 CITY STATE_PROV POSTAL_CODE ... Type CHAR(14) VARCHAR(50) VARCHAR(50) VARCHAR(50) VARCHAR(50) VARCHAR(50) VARCHAR(50) ... Description Primary key. Uniquely identifies this employee. Type of address, such as their home, work, mailing, or shipping address. Name of the individual or organization residing at this address. First address line. Second address line. City State or province Postal code ...

ROWID_OBJECT CHAR(14)

To define the relationship between records in the two base objects, the PersonAddrRel base object could have the following columns:
Column Type Description Primary key. Uniquely identifies this person in the base object. Foreign key to the ROWID_OBJECT column in the Person base object. Foreign key to the ROWID_OBJECT column in the Address base object.

ROWID_OBJECT CHAR(14) PERS_FK ADDR_FK CHAR(14) CHAR(14)

500 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

Note that the column type of the foreign key columns—CHAR(14)—matches the primary key to which they point. Example Configuration Steps After you have configured the relationship base object (PersonAddrRel), you would complete the following tasks: 1. Configure foreign keys from this base object to the ROWID_OBJECT of the Person and Address base objects. For more information, see “Configuring Foreign-Key Relationships Between Base Objects” on page 140.

In this example, note that Person #786 has two addresses, and that Address #1028 has two persons.
3.

Use the PersonAddrRel base object when configuring match column rules for the related records. For more information, see “Configuring Match Column Rules for Match Rule Sets” on page 542.

Intra-Table Paths
Within a base object, parent/child relationships can exist between individual records. Siperian Hub allows you to clarify relationships between records in the same base object, and then use those relationships when configuring column match rules. Example Base Object for Intra-Table Paths Consider the following example of an Employee base object in which reporting relationships exist between employees.

The relationships among employees is hierarchical. The CEO is at the top of the hierarchy, representing what is called the global ultimate parent record.

Create a Relationship Base Object In order to configure match rules for this kind of object, you would create a separate base object to describe to Siperian Hub the relationships between records. For example, you could create and configure a EmplRepRel base object with the following columns:
Column ROWID_OBJECT EMPLOYEE_FK Type CHAR(14) CHAR(14) Description Primary key. Uniquely identifies this relationship record. Foreign key to the ROWID_OBJECT of the employee record. Foreign key to the ROWID_OBJECT of a manager record.

REPORTS_TO_FK CHAR(14)

Note that the column type of the foreign key columns—CHAR(14)—matches the primary key to which they point. Example Configuration Steps After you have configured this base object, you must complete the following tasks:

Configuring the Match Process 503

Configuring Match Paths for Related Records

1.

Configure foreign keys from this base object to the ROWID_OBJECT of the Employee base object. For more information, see “Configuring Foreign-Key Relationships Between Base Objects” on page 140.

Note that you can define many-to-many relationships between records. For example, the employee whose ROWID_OBJECT is 31 reports to two different managers (ROWID_OBJECT=82 and ROWID_OBJECT=71), while this

504 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

manager (ROWID_OBJECT=82) has three reports (ROWID_OBJECT=24, 29, and 31).
3.

Use the EmplRepRel base object when configuring match column rules for the related records according to the instructions in “Configuring Match Column Rules for Match Rule Sets” on page 542. For example, you could create a match rule that takes into account the employee’s manager to produce more accurate matches.

Note: This example used a REPORTS_TO field to define the relationship, but you could use piece of information to associate the records—even something more generic and flexible like RELATIONSHIP_TYPE.

Navigating to the Paths Tab
To navigate to the Paths tab for a base object: 1. In the Schema Manager, navigate to the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Click the Paths tab.

Configuring the Match Process 505

Configuring Match Paths for Related Records

The Schema Manager displays the Paths tab.

Sections of the Paths Tab
The Paths tab has two sections:
Section Description

Path Components Configure the foreign keys used to traverse the relationships. For more information, see “Configuring Path Components” on page 507. Filters Configure filters used to include or exclude records for matching. For more information, see “Configuring Filters for Match Paths” on page 511.

Root Base Object
The root base object is displayed automatically in the Path Components section of the screen and is always available. The root base object represents an entity without child or parent relationships. If you want to configure match rules that involve parent or child records, you need to explicitly add path components to the root base object, and

506 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

these relationships must have been configured beforehand (see “Configuring Foreign-Key Relationships Between Base Objects” on page 140).

Configuring Path Components
This section describes how to configure path components in the Schema Manager. Path components provide a way to define the connection between parent and child tables using foreign keys for the purpose of using columns from that table in a match column.

Properties of Path Components
This section describes properties of path components. Display Name The name of this path component as it will be displayed in the Hub Console. Physical Name Actual name of the path component in the database. Siperian Hub will suggest a physical name for the path component based on the display name that you enter. Check For Missing Children The Check for Missing Children check box instructs Siperian Hub to either allow for missing child records (enabled, the default) or to require all parent records to have child records.
Setting Enabled (Checked) Disabled (Unchecked) Description If you might have some missing child records and you have rules that do not include columns in the tables that might be missing records. If all of your rules use the child columns and do not have null match enabled. In this case, checking for missing children does not add any value, and it can have an negative impact on performance.

Configuring the Match Process 507

Configuring Match Paths for Related Records

If you are certain that your data is complete (parent records have child records), and you include the parent in the child match rule, then inter-table matching works as expected. However, if your data tends to contain parent records that are missing child records, or if you do not include the parent column in the child match rule, you must check (select) the Check for Missing Children check box in the path component associated with this match column rule to ensure that an outer join occurs when Siperian Hub checks for records to match.

Note: If the Check for Missing Children option is enabled, Siperian Hub performs an outer join between the parent and child tables, which can have a performance impact. Therefore, when not needed, it is more efficient to disable this option. Constraints
Property Table Direction Description List of tables in the schema. Direction of the foreign key: • • • Foreign Key On Parent-to-Child Child-to-Parent N/A

Column to which the foreign key points. This column can be either in a different base object or the same base object.

508 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

Adding Path Components
To add a path component: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Path Components section, click the Add button. The Schema Manager displays the Add Path Component dialog.

3.

4.

Specify the properties for this path component. For more information, see “Properties of Path Components” on page 507. Click OK. Click the button to save your changes.

5. 6.

Editing Path Components
To edit a path component: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Path Components tree, select the path component that you want to delete.

3.

Configuring the Match Process 509

Configuring Match Paths for Related Records

4.

In the Path Components section, click the

button.

The Schema Manager displays the Edit Path Component dialog.

5.

Specify the properties for this path component. You can change the following values: • • Display Name (see “Display Name” on page 507) Check for Missing Children (see “Check For Missing Children” on page 507) button to save your changes.

6. 7.

Click OK. Click the

Deleting Path Components
You can delete path components but not the root base object. To delete a path component: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Path Components tree, select the path component that you want to delete. In the Path Components section, click the Click Yes. Click the button to save your changes. button. The Schema Manager prompts you to confirm deletion.

3. 4.

5. 6.

510 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

Configuring Filters for Match Paths
This section describes how to configure filters for match paths in the Schema Manager.

About Filters
In match paths, a filter allows you to selectively determine whether to include or exclude records for matching based on values in a given column. When you define a filter for a column, you specify the filter condition with one or more values that determine which records qualify for match processing. For example, if you have an Address base object that contains both shipping and billing addresses, you might configure a filter that includes only billing addresses for matching and ignores the shipping addresses. During execution, the match process will match records in the match batch with billing address records only.

Filter Properties
In Siperian Hub, filters have the following properties.
Setting Column Operator Description Column to configure in the currently-selected base object. Operator to use for this filter. One of the following values: • • Values IN—Include columns that contain the specified values. NOT IN—Exclude columns that contain the specified values.

One or more values to use for this filter.

Example Filter
For example, if you wanted to match only on mailing addresses in an Address base object, you could specify:

Configuring the Match Process

511

Configuring Match Paths for Related Records

Setting Column Operator Values

Example Value ADDR_TYPE IN MAILING

In this example, only mailing addresses would qualify for matching—records in which the COLUMN field contains “MAILING”. All other records would be ignored.

Adding Filters
If you add multiple filters, Siperian Hub evaluates the entire expression using the logical AND operator. For example,
xExpr AND yExpr AND zExpr

To add a filter: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Filters section, click the Add button. The Schema Manager displays the Add Filter dialog.

3.

4.

Specify the properties for this path component. For more information, see “Properties of Path Components” on page 507.

512 Siperian Hub Administrator Guide

Configuring Match Paths for Related Records

5.

Specify the value(s) for this filter according to the instructions in “Editing Values for a Filter” on page 513. Click the button to save your changes.

6.

Editing Values for a Filter
To edit values for a filter: 1. Do one of the following: • •
2.

Add a filter. For more information, see “Adding Filters” on page 512. Edit filter properties. For more information, see “Editing Filter Properties” on page 513. button next to the

In either the Add Filter or Edit Filter dialog, click the Values field. The Schema Manager displays the Edit Values dialog. Configure the values for this filter. • To add a value, click the then click OK.

3.

button. When prompted, specify a value and

•
4. 5.

To delete a value, select it in the Edit Values dialog, click the then click Yes when prompted to delete the value. button to save your changes.

button, and

Click OK. Click the

Editing Filter Properties
To edit filter properties: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.

Configuring the Match Process 513

Configuring Match Paths for Related Records

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Filters section, click the button. The Schema Manager displays the Add Filter dialog.

3.

4.

Specify the properties for this path component. For more information, see “Properties of Path Components” on page 507. Specify the value(s) for this filter according to the instructions in “Editing Values for a Filter” on page 513. Click the button to save your changes.

5.

6.

Deleting Filters
To delete a filter: 1. In the Schema Manager, navigate to the Paths tab according to the instructions in “Navigating to the Paths Tab” on page 505.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Filters section, select the filter that you want to delete, and then click the button. The Schema Manager prompts you to confirm deletion. Click Yes.

3.

4.

514 Siperian Hub Administrator Guide

Configuring Match Columns

Configuring Match Columns
This section describes how to configure match columns so that you can use them in match column rules (see “Configuring Match Column Rules for Match Rule Sets” on page 542). If you want to configure primary key match rules instead, see the instructions in “Configuring Primary Key Match Rules” on page 578.

About Match Columns
A match column is a column that you want to use in a match rule, such as name or address columns. Before you can use a column in rule definitions, you must first designate it as a column that can be used in match rules, and provide information about the data it contains. To learn more, see “Match Columns Depend on the Search Strategy” on page 515.

Match Column Types
There are two types of columns used in match rules:
Column Type Fuzzy Description Probabilistic match. Suitable for columns containing data that varies in spelling, abbreviations, word sequence, completeness, reliability, and other inconsistencies. Examples include street addresses and names of people or organizations. Deterministic match. Suitable for columns containing consistent and predictable patterns. Exact match columns match only on identical data. Examples include IDs, postal codes, industry codes, or any other well-defined piece of information.

Exact

Match Columns Depend on the Search Strategy
The types of match columns that you can configure depend on the type of the base object that you are configuring (see “Exact-match and Fuzzy-match Base Objects” on

Configuring the Match Process 515

Configuring Match Columns

page 320). The type of base object is defined by the selected match / search strategy (see “Match/Search Strategy” on page 493).
Match Strategy Fuzzy-match base objects Description Allows you to configure fuzzy-match columns as well as exact-match columns. For more information, see “Configuring Match Columns for Fuzzy-match Base Objects” on page 519. Allows you to configure exact-match columns but not fuzzy-match columns. For more information, see “Configuring Match Columns for Exact-match Base Objects” on page 527.

Exact-match base objects

Path Component
The path component is either the source table to use for a match column definition, or the match path used to navigate a hierarchy of records. Match paths are used for configuring match column rules involving related records in either separate tables or in the same table. Before you can specify a path component, the match path must be configured. For more information, see “Configuring Match Paths for Related Records” on page 497. To specify a path component for a match column: 1. Click the key next to the Path Component field. The Schema Manager displays the Select Match Path Component dialog.

2. 3.

Select the match path component. Click OK.

516 Siperian Hub Administrator Guide

Configuring Match Columns

Field Types
For fuzzy-match columns, the field name drop-down list displays the following field types. For more information, see “Adding Exact-match Columns for Fuzzy-match Base Objects” on page 525.
Field Types

Field Name Address_Part1

Description Includes the part of address up to, but not including, the locality last line. The position of the address components should be the normal word order used in your data population. Pass this data in one field. Depending on your base object, you may concatenate these attributes into one field before matching. For example, in the US, an Address_Part1 string includes the following fields: Care-of + Building Name + Street Number + Street Name + Street Type + Apartment Details. Address_Part1 uses methods and options designed specifically for addresses. Locality line in an address. For example, in the US, a typical Address_Part2 includes: City + State + Zip (+ Country). Matching on Address_Part2 uses methods and options designed specifically for addresses. Two general purpose fields. These fields are matched using a general purpose, string matching algorithm that compensates for transpositions and missing characters or digits. Matches any type of date, such as date of birth, expiry date, date of contract, date of change, creation date, and so on. It expects the date to be passed in Day+Month+Year format. It supports the use or absence of delimiters between the date components. Matching on dates uses methods and options designed specifically for dates. It overcomes the typical error and variation found in this data type. Matches any type of ID number, such as: Account number, Customer number, Credit Card number, Drivers License number, Passport, Policy number, SSN or other identity code, VIN, and so on. It uses a string matching algorithm that compensates for transpositions and missing characters or digits. Matches the names of organizations, such as company names, business names, institution names, department names, agency names, trading names, and so on. This field supports matching on a single name or on a compound name (such as a legal name and its trading style). You may also use multiple names (for example, a legal name and a trading style) in a single Organization_Name column for the match.

Address_Part2

Attribute1, Attribute2

Date

ID

Organization_Name

Configuring the Match Process 517

Configuring Match Columns

Field Types (Cont.)

Field Name Person_Name

Description Matches the names of people. Use the full person name. The position of the first name, middle names, and family names, should be the normal word order used in your population. For example, in English-speaking countries, the normal order is: First Name + Middle Name(s) + Family Name(s). Depending on your base object design, you can concatenate these fields into one field before matching. This field supports matching on a single name, or an account name (such as JOHN & MARY SMITH). You may also use multiple names, such as a married name and a former name. Can be used to place more emphasis on the postal code than if it were included in the Address_Part2 field. It is for all types of postal codes, including Zip codes. It uses a string matching algorithm that compensates for transpositions and missing characters or digits. Used to match telephone numbers. It uses a string matching algorithm that compensates for transpositions and missing digits or area codes.

Postal_Area

Telephone_Number

Selecting Multiple Columns for Matching
If you specify more than one column for matching: • Values are concatenated into the field used by the match purpose, with a space inserted between each value. For example, you can select first, middle, last, and suffix columns in your base object. The concatenated fields will look like this (a space follows the last word in the string):
first middle last suffix

For example:
Anna Maria Gonzales MD

•

For data containing spaces or null data: • • • If there are spaces in the data, then the spaces remain and the field is not NULL. If all the fields are null, then the combined value is null. If any component on the combined field is null, then no extra space will be added to replace the null.

518 Siperian Hub Administrator Guide

Configuring Match Columns

Note: Concatenating columns is not recommended for exact match columns.

Configuring Match Columns for Fuzzy-match Base Objects

Fuzzy-match base objects can have both fuzzy and exact-match columns. For exact-match base objects instead, see “Configuring Match Columns for Exact-match Base Objects” on page 527.

Navigating to the Match Columns Tab for a Fuzzy-match Base Object
To define match columns for a fuzzy-match base object: 1. In the Schema Manager, select the fuzzy-match base object that you want to configure.
2.

Click the Match/Merge Setup node. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486. Click the Match Columns tab.

3.

Configuring the Match Process 519

Configuring Match Columns

The Schema Manager displays the Match Columns tab for the fuzzy-match base object.

The Match Columns tab for a fuzzy-match base object has the following sections.
Property Description

The match key type describes important characteristics about a column to Siperian Hub. Siperian Hub has some intelligence about names and addresses, so this information helps Siperian Hub generate keys correctly and conduct better searches. This is the main criterion for the search that builds the initial list of potential match candidates. This key type should be based on the main type of data that is in physical column(s) that make up the fuzzy match key. For a fuzzy-match base object, you can select one of the following key types:
Key Type Person_Name Organization_Name Address_Part1 Description Used if your fuzzy match key contains data for individuals only. Used if your fuzzy match key contains data for organizations only, or if it contains data for both organizations and individuals. Used if your fuzzy match key contains address data to be consolidated.

Note: Key types are based on the population you select. The above list of key types applies to the default population (US). Other populations might have different key types. If you require another population, contact Siperian support.

Configuring the Match Process 521

Configuring Match Columns

Key Widths

The match key width determines how fast the searches are, the number of possible match candidates returned, and how much disk space the keys consume. Key widths apply to fuzzy match objects only.
Key Width Standard Extended Description Appropriate for most fuzzy match keys, balancing reliability and space usage. Might result in more match candidates, but at the cost of longer processing time to generate keys. This option provides some additional matching capability due to the concatenation of columns. This key width works best when: • • • Limited your data set is not extremely large your data set is not complete you have sufficient resources to handle the processing time and disk space requirements

Trades some match reliability for disk space savings. This option might result in fewer match candidates, but searches can be faster. This option works well if you are willing to undermatch for faster searches that use less disk space for the keys. Limited keys match fewer records with word-order variations than standard keys. This choice provides a subset of the Standard key set, but might be the best option if disk space is restricted or the data volume is extremely large. Generates a single key per base object record. This option trades some match reliability for performance (reduces the number of matches that need to be performed) and disk space savings (reduces the size of the match key table). Depending on characteristics of the data, a preferred key width might result in fewer match candidates.

Preferred

Steps to Configure Fuzzy Match Key Properties To configure fuzzy match key properties for a fuzzy-match base object: 1. In the Schema Manager, navigate to the Match Columns tab according to the instructions in “Navigating to the Match Columns Tab for a Fuzzy-match Base Object” on page 519.

522 Siperian Hub Administrator Guide

Configuring Match Columns

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Configure the following settings for this fuzzy-match base object.
Property Key Type Description Type of field primarily used in the match. This is the main criterion for the search that builds the initial list of potential match candidates. This key type should be based on the main type of data stored in the base object. For more information, see “Key Types” on page 521. Size of the search range for which keys are generated. For more information, see “Key Widths” on page 522. Path component for this fuzzy match key. This is a table containing the column(s) to designate as the key type: Base Object, Child Base Object table, or Cross-reference table. For more information, see “Path Component” on page 516.

3.

Key Width Path Component

4.

Click the Save button

to save your changes.

Adding a Fuzzy-match Column for Fuzzy-match Base Objects
To define a fuzzy-match column for a fuzzy-match base object: 1. In the Schema Manager, navigate to the Match Columns tab. For more information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base Object” on page 519.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. To add a fuzzy-match column, click the button.

3.

Configuring the Match Process 523

Configuring Match Columns

The Schema Manager displays the Add Fuzzy-match Column dialog.

4.

Specify the following settings.
Property Match Path Component Description Match path component for this fuzzy-match column. For a fuzzy-match column, the source table can be the parent table, a parent cross-reference table, or any child base object table. For more information, see “Path Component” on page 516. Name of this field as it will be displayed in the Hub Console. For fuzzy match columns, this is a drop-down list where you can select the type of data in the match column being defined, as described in “Field Types” on page 517.

Field Name

5.

Specify the base object column(s) for the fuzzy match. To add a column to the Selected Columns list, select a column name and then click the right arrow button. Note: If you add multiple columns, the values are concatenated, with a separator space between values. For more information, see “Selecting Multiple Columns for Matching” on page 518.

6.

Click OK.

524 Siperian Hub Administrator Guide

Configuring Match Columns

The Schema Manager adds the match column to the Match Columns list.
7.

Click the Save button

to save your changes.

Adding Exact-match Columns for Fuzzy-match Base Objects
To define an exact-match column for a fuzzy-match base object: 1. In the Schema Manager, navigate to the Match Columns tab. For more information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base Object” on page 519.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. To add an exact-match column, click the button. The Schema Manager displays the Add Exact-match Column dialog.

3.

4.

Specify the following settings.

Configuring the Match Process 525

Configuring Match Columns

Property Match Path Component

Description Match path component for this exact-match column. For an exact-match column, the source table can be the parent table and / or child physical columns. For more information, see “Path Component” on page 516. Name of this field as it will be displayed in the Hub Console.

Field Name
5.

Specify the base object column(s) for the exact match. To add a column to the Selected Columns list, select a column name and then click the right arrow. Note: If you add multiple columns, the values are concatenated, with a separator space between values. For more information, see “Selecting Multiple Columns for Matching” on page 518. Note: Concatenating columns is not recommended for exact match columns. Click OK. The Schema Manager adds the match column to the Match Columns list. Click the Save button to save your changes.

6.

7.

Editing Match Column Properties for Fuzzy-match Base Objects
Instead of editing match column properties, you must: • delete the match column, as described in “Deleting Match Columns for Fuzzy-match Base Objects” on page 526 • add a new match column, specifying the settings that you want, as described in “Adding Exact-match Columns for Fuzzy-match Base Objects” on page 525

Deleting Match Columns for Fuzzy-match Base Objects
To delete a match column for a fuzzy-match base object: 1. In the Schema Manager, navigate to the Match Columns tab. For more information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base Object” on page 519.

526 Siperian Hub Administrator Guide

Configuring Match Columns

2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Match Columns list, select the match column that you want to delete. Click the Click Yes. Click the Save button to save your changes. button. The Schema Manager prompts you to confirm deletion.

3. 4.

5. 6.

Configuring Match Columns for Exact-match Base Objects

Before you define match column rules, you must define the match columns on which they will be based. Exact-match base objects can have only exact-match columns. For more information about configuring match columns for fuzzy-match base objects instead, see “Configuring Match Columns for Fuzzy-match Base Objects” on page 519.

Navigating to the Match Columns Tab for an Exact-match Base Object
To define match columns for an exact-match base object: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the exact-match base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Click the Match Columns tab.

Configuring the Match Process 527

Configuring Match Columns

The Schema Manager displays the Match Columns tab for the exact-match base object.

Adding Match Columns for Exact-match Base Objects
You can add only exact-match columns for exact-match base objects. Fuzzy-match columns are not allowed. To add an exact-match column for an exact-match base object: 1. In the Schema Manager, navigate to the Match Columns tab. For more information, see “Navigating to the Match Columns Tab for an Exact-match Base Object” on page 527.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. To add an exact-match column, click the button. The Schema Manager displays the Add Exact-match Column dialog.

3.

Configuring the Match Process 529

Configuring Match Columns

4.

Specify the following settings.
Property Match Path Component Description Match path component for this exact-match column. For an exact-match column, the source table can be the parent table and / or child physical columns. For more information, see “Path Component” on page 516. Name of this field as it will be displayed in the Hub Console.

Field Name
5.

Specify the base object column(s) for the exact match. To add a column to the Selected Columns list, select a column name and then click the right arrow. Note: If you add multiple columns, the values are concatenated, with a separator space between values. For more information, see “Selecting Multiple Columns for Matching” on page 518. Note: Concatenating columns is not recommended for exact match columns. Click OK. The Schema Manager adds the selected match column(s) to the Match Columns list.

If you want to add a match column with the same name, click the Save button to save your changes first. Add a new match column, specifying the settings that you want, as described in “Adding Match Columns for Exact-match Base Objects” on page 529.

3.

530 Siperian Hub Administrator Guide

Configuring Match Rule Sets

Deleting Match Columns for Exact-match Base Objects
To delete a match column for an exact-match base object: 1. In the Schema Manager, navigate to the Match Columns tab. For more information, see “Navigating to the Match Columns Tab for an Exact-match Base Object” on page 527.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the Match Columns list, select the match column that you want to delete. Click the Click Yes. Click the Save button to save your changes. button. The Schema Manager prompts you to confirm deletion.

About Match Rule Sets
A match rule set is a logical collection of match column rules (see “Configuring Match Column Rules for Match Rule Sets” on page 542) that have some properties in common. Match rule sets are associated with match column rules only—not primary key match rules (which are described in “Configuring Primary Key Match Rules” on page 578). Match rule sets allow you to execute different sets of match column rules at different times. The match process uses only one match rule set per execution. To match using a different match rule set, the match rule set must be selected and the match process must be executed again.

Configuring the Match Process 531

Configuring Match Rule Sets

Note: Only one match column rule in the match rule set needs to succeed in order to declare a match between records.

What Match Rule Sets Specify
Match rule sets include: • a search level that dictates the search strategy • • any number of automatic and manual match column rules optionally, a filter that allows you to selectively include or exclude records from the match batch during the match process

Multiple Match Rule Sets and the Specified Default
You can configure any number of rule sets. When users want to run the Match batch job, they select one rule set from the list of rule sets that have been defined for the base object.

For more information about choosing match rule sets, see “Selecting a Match Rule Set” on page 737. In the Schema Manager, you designate one match rule set as the default.
Default (*)

532 Siperian Hub Administrator Guide

Configuring Match Rule Sets

When to Use Match Rule Sets
Match rule sets allow you to accommodate different match column rule requirements at different times. For example, you might use one match rule set for an initial data load and a different match rule set for subsequent incremental loads. Similarly, you might use one match rule set to process all records, and another match rule set with a filter to process just a subset of records (see “Filtering SQL” on page 536).

Rule Set Evaluation
Before saving any changes to a match rule set (including any changes to match rules in the match rule set), the Schema Manager analyzes the match rule set and prompts you with a warning message if the match rule set has any issues, as shown in the following example.

Note: This is only a warning message. You can choose to ignore the message and save changes anyway. Example issues include a match rule set that: • is identical to an already existing match rule set • • • • is empty—no match column rules have been added contains no fuzzy-match column rules for a fuzzy-match base object contains one or more fuzzy-match columns but no exact-match column (can impact match performance) contains fuzzy and exact-match columns with the same source columns

Configuring the Match Process 533

Configuring Match Rule Sets

Match Rule Set Properties
This section describes the properties for match rule sets.

Name
The name of the rule set. Specify a unique, descriptive name.

Search Levels

Used with fuzzy-match base objects only. When you configure a match rule set, you define a search level that instructs Siperian Hub on how stringently and thoroughly to search for candidate matches. The goal of the match process is to find the optimal number of matches for your data: • not too few (called undermatching), which misses relevant matches, or • not too many (called overmatching), which generates too many matches, including matches that are not relevant

For any name or address in a fuzzy match key, Siperian Hub uses the defined search level to generate different key ranges for the purpose of determining which records are possible match candidates—and to which records the match column rules will be applied. You can choose one of the following search levels:
Search Level Narrow Description Most stringent level in searching for possible match candidates.This search level is fast, but it can result in fewer matches than other search levels might generate and possibly result in undermatching. Narrow can be appropriate if your data set is relatively correct and complete, or for very large data sets with highly matchy data. Appropriate for most rule sets.

Typical

534 Siperian Hub Administrator Guide

Configuring Match Rule Sets

Search Level Exhaustive

Description Generates a larger set of possible match candidates than the Typical level. This can result in more matches than other search levels might generate, possibly result in overmatching, and take more time. This level might be appropriate for smaller data sets that are less complete. Generates a still larger set of possible match candidates, which can result in overmatching and take more much more time. This level might be appropriate for smaller data sets that are less complete, or to identify the highest possible number of matching records.

Extreme

The search level you choose should be determined by the size of your data set, your time constraints, and how critical the matches are. Depending on your circumstances and requirements, it is sometimes more appropriate to undermatch, while at other times, it is more appropriate to overmatch. Implementations dealing with relatively reliable and complete data can use the Narrow level, while implementations dealing with less reliable data or with more critical problems should use Exhaustive or Extreme. The search level might also differ depending on the phase of a project. It might be necessary to have a looser level (exhaustive or extreme) for initial matching, and tighten as the data is deduplicated.

Enable Search by Rules

This setting specifies whether searching by rules is enabled (checked) or not (unchecked, the default). Used with fuzzy-match base objects only and applies only to the SIF searchMatch request. The searchMatch request searches for records in a package based on match column and rule definitions. The searchMatch request uses the columns in these records to generate match columns that are used by the match server to find match candidates. For more information about searchMatch, see the Siperian Services Integration Framework Guide and the Siperian Hub Javadoc. By default, when an application calls the SIF searchMatch request, all possible match columns are generated from the package or mapping records specified in the request,

Configuring the Match Process 535

Configuring Match Rule Sets

and the match is performed by treating all columns with equal weight. You can enable this option, however, to allow applications to specify input match columns, in which case the searchMatch API ignores any columns that were not passed as part of the request. You might use this feature if, for example, you were using a custom population definition and wanted to call the searchMatch API with a particular set of rules.

Enable Filtering
Specifies whether filtering is enabled for this match rule set. • If checked (selected), allows you to define a filter (see “Filtering SQL” on page 536) for this match rule set. When running a Match job, users can select the match rule set (see “Selecting a Match Rule Set” on page 737) with a filter defined so that the Match job processes only the subset of records that meet the filter criteria. • If unchecked (not selected), then all records will be processed by the match rule set when the Match batch job runs.

For example, if you had an Organization base object that contained multiple types of organizations (customers, vendors, prospects, partners, and so on), you could define different match rule sets that selectively processed only the type of records you want to match: MatchAll (no filter), MatchCustomersOnly, MatchVendorsOnly, and so on.

Filtering SQL
By default, when the Match batch job is run (see “Match Jobs” on page 734), the match rule set processes all records. If the Enable Filtering check box (see “Enable Filtering” on page 536) is selected (checked), you can specify a filter condition to restrict processing to only those rules that meet the filter condition. A filter is analogous to a WHERE clause in a SQL statement. The filter expression can be any expression that is valid for the WHERE clause syntax used in your database platform. Note: The match rule set filter is applied to the base object records that are selected for the match batch only (the records to match from)—not the records in the match pool (the records to match to). For more information, see “Flagging the Match Batch” on page 329.

536 Siperian Hub Administrator Guide

Configuring Match Rule Sets

For example, suppose your implementation had an Organization base object that contained multiple types of organizations (customers, vendors, prospects, partners, and so on). Using filters, you could define a match rule set (MatchCustomersOnly) that processed customer data only.
org_type=’C’

All other, non-customer records would be ignored and not processed by the Match job. Note: It is the administrator’s responsibility to specify an appropriate SQL expression that correctly filters records during the Match job. The Schema Manager validates the SQL syntax according to your database platform, but it does not check the logic or suitability of your filter condition.

Match Rules
This area of the window displays a list of match column rules that have been configured for the selected match rule set. For more information, see “Configuring Match Column Rules for Match Rule Sets” on page 542.

Navigating to the Match Rule Set Tab
To navigate to the Match Rule Set tab: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Adding Match Rule Sets
To add a new match rule set: 1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match Rule Set Tab” on page 537.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the button. The Schema Manager displays the Add Match Rule Set dialog.

3.

4.

Enter a unique, descriptive name for this new match rule set.

538 Siperian Hub Administrator Guide

Configuring Match Rule Sets

5.

Click OK. The Schema Manager adds the new match rule set to the list. Configure the match rule set according to the instructions in the next section, “Editing Match Rule Set Properties” on page 539.

6.

Editing Match Rule Set Properties
To edit the properties of a match rule set: 1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match Rule Set Tab” on page 537.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the match rule set that you want to configure. The Schema Manager displays its properties in the properties panel. • The following example shows the properties for a fuzzy-match base object.

3.

Configuring the Match Process 539

Configuring Match Rule Sets

•

The following example shows the properties for an exact-match base object.

4.

Configure properties for this match rule set. For more information, see “Match Rule Set Properties” on page 534. Configure match columns for this match rule set according to the instructions in “Configuring Match Column Rules for Match Rule Sets” on page 542. Click the Save button to save your changes. Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.

5.

6.

7.

If you are prompted to confirm saving changes, click OK button to save your changes.

540 Siperian Hub Administrator Guide

Configuring Match Rule Sets

Renaming Match Rule Sets
To rename a match rule set: 1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match Rule Set Tab” on page 537.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the match rule set that you want to rename. Click the button. The Schema Manager displays the Edit Rule Set Name dialog.

3. 4.

5. 6.

Specify a unique, descriptive name for this match rule set. Click OK. The Schema Manager updates the name of the match rule set in the list.

Configuring the Match Process 541

Configuring Match Column Rules for Match Rule Sets

Deleting Match Rule Sets
To delete a match rule set: 1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match Rule Set Tab” on page 537.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Select the name of the match rule set that you want to delete. Click the Click Yes. The Schema Manager removes the deleted match rule set, along with all of the match column rules it contains, from the list. button. The Schema Manager prompts you to confirm deletion.

3. 4.

5.

Configuring Match Column Rules for Match Rule Sets
This section describes how to configure match column rules for a match rule set in your Siperian Hub implementation. For more information about match rules sets, see “Configuring Match Rule Sets” on page 531. For more information about the difference between match column rules and primary key rules, see “Configuring Primary Key Match Rules” on page 578.

About Match Column Rules
A match column rule determines what constitutes a match during the match process. Match column rules determine whether two records are similar enough to consolidate. Each match rule is defined as a set of one or more match columns that it needs to examine for points of similarity. Match rules are configured by setting the conditions for identifying matching records within and across source systems. For more information, see “About the Match Process” on page 317.

542 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Prerequisites for Configuring Match Column Rules
You can configure match column rules only after you have: • configured the columns that you intend to use in your match rules, as described in “Configuring Match Columns” on page 515 • created at least one match rule set, as described in “Configuring Match Rule Sets” on page 531

Specifying Consolidation Options for Matched Records
For each match column rule, decide whether matched records should be automatically or manually consolidated. For more information, see “Specifying Consolidation Options for Match Column Rules” on page 574 and “Consolidating Records Automatically or Manually” on page 336.

Configuring the Match Process 543

Configuring Match Column Rules for Match Rule Sets

Match Rule Properties for Fuzzy-match Base Objects Only

This section describes match rule properties for fuzzy-match base objects. These properties do not apply to exact-match base objects.

Match / Search Strategy

For fuzzy-match base objects, the match / search strategy defines the strategy that Siperian Hub uses for searching and matching in the match rule. Select one of the following options:
Strategy Option Fuzzy Description Probabilistic match that takes into account spelling variations, possible misspellings, and other differences that can make matching records non-identical. Matches only records that are identical.

Exact

Certain configuration settings on the Match / Merge Setup tab apply to only one type of column. In this document, such features are indicated with a graphic that shows whether it applies to fuzzy-match columns only (as in the following example), or exact-match columns only. No graphic means that the feature applies to both. The match / search strategy determines how to match candidate A with candidate B using fuzzy or exact methods. The match / search strategy can affects the quantity and quality of the match candidates. An exact match / search strategy requires clean and

544 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

complete data—it might miss some matches if the data is less clean, incomplete, or full of duplicates. When defining match rule properties, you must find the optimal balance between finding all possible candidates, and not encumber the process with too many irrelevant candidates. Note: This match / search strategy is configured at the match rule level. For more information about the match / search strategy configured at the base object level (which determines whether it is a fuzzy-match base object or exact-match base object), see “Match/Search Strategy” on page 493. When specifying the match / search strategy for a fuzzy-match base object, consider the implications of configuring the following types of match rules:
Type of Match Rule Fuzzy - Fuzzy Search Strategy Exact - Exact Search Strategy Filtered - Fuzzy Search Strategy Applies to Fuzzy and exact-match columns. Exact-match columns only. This option bypasses the fuzziness of the base object and executes a simple exact match rule on a fuzzy base object. Exact-match columns only. This option uses the fuzzy match key as a filter, and then applies the exact match rule.

Match Purpose

For fuzzy-match base objects, the match purpose defines the primary goal behind a match rule. For example, if you're trying to identify matches for people where address is an important part of determining whether two records are for the same person, then you would choose the Match Purpose called Resident. For every match rule you define, you must choose the purpose of the rule from a list of predefined match purposes provided by Siperian. Each match purpose contains knowledge about how best to compare two records to achieve the purpose of the match. Siperian Hub uses the selected match purpose as a basis for applying the match rules to determine matched records. The behavior of the rules is dependent on the

Configuring the Match Process 545

Configuring Match Column Rules for Match Rule Sets

selected purpose. The list of available match purposes depends on the population used, as described in “Fuzzy Population” on page 494, What the Match Purpose Determines The match purpose determines: • how your match rules behave • • which columns are required how much attention Siperian Hub pays to each of the columns used in the match process

Two rules with all attributes identical (except for the purpose) will return different sets of matches because of the different purpose. Mandatory and Optional Fields Each match purpose supports a combination of mandatory and optional fields. Each field is weighted according to its influence in the match decision. Some fields in some purposes may be grouped. There are two types of groupings: • Required—requires at least one of the field members to be non-null • Best of—contributes only the best score from the fields in the group to the overall match score

For example, in the Individual match purpose: • Person_Name is a mandatory field • • One of either ID Number or Date of Birth is required Other attributes are optional

The overall score returned by each purpose is calculated by adding the participating field scores multiplied by their respective weight and divided by the total of all field weights. If a field is optional and is not provided, it is not included in the weight calculation.

546 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Name Formats Siperian Hub match has the concept of a default name format which tells it where to expect the last name. The options are: • Left—last name is at the start of the full name, for example Smith Jim • Right—last name is at the end of the full name, for example, Jim Smith

The name format used by Siperian Hub depends on the purpose that you're using. If you are using Organization, then the default is Last name, First name, Middle name. If using Person/Resident then the default is First Middle Last. Bear this in mind when formatting data for matching. It might not make a big difference, but there are edge cases where it helps, particularly for names that do not fall within the selected population.

Description This purpose is for matches intended to identify a person by name. This purpose is best suited to online searches when a name-only lookup is required and a human is available to make the choice. Matching in batch typically requires other attributes in addition to name to make match decisions. Use this purpose only when the rule does not contain address fields. This purpose will allow matches between people with an address and those without an address. If the rules contain address fields, use the Resident purpose instead. This purpose uses the following fields: • Person_Name (Required) • Address_Part1 • Address_Part2 • Postal_Area • Telephone_Number • ID • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, use Postal_Area as a repeat value in the Address_Part2 field.

548 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Individual

Description This purpose is intended to identify a specific individual by name and with either the same ID number or date of birth attributes. Since this purpose requires additional information, it is typically used after a search by Person_Name. This purpose uses the following fields: • Person_Name (Required) • ID-Either ID or Date are required (Using both is acceptable.) • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required.

Resident

Intended to identify a person at an address. This purpose is typically used after a search by either Person_Name or Address_Part1. Optional input fields help qualify or rank a match if more information is available. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field. This purpose uses the following fields: • Person_Name (Required) • Address_Part1 (Required) • Address_Part2 • Postal_Area • Telephone_Number • ID • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required.

Configuring the Match Process 549

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Household

Description Designed to identify matches where individuals with the same or similar family names share the same address. This purpose is typically used after a search by Address_Part1. (Note: it is not practical to search by Person_Name because ultimately only one word from the Person_Name must match, and a one-word search will not perform well in most situations). Emphasis is placed on the Last Name, the major word of the Person_ Name field, so this is one of the few cases where word order is important in the way the records are presented for matching. However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name. This purpose uses the following fields: • Person_Name (Required) • Address_Part1 (Required) • Address_Part2 • Postal_Area • Telephone_Number • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

550 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Family

Description Designed to identify matches where individuals with the same or similar family names share the same address or the same telephone number. This purpose is typically used after a tiered search (multi-search) by Address_Part1 and Telephone_Number. (Note: it is not practical to search by Person_Name because ultimately only one word from the Person_Name needs to match, and a one-word search will not perform well in most situations). Emphasis is placed on the Last Name, the major word of the Person_ Name field, so this is one of the few cases where word order is important in the way the records are presented for matching. However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name. This purpose uses the following fields: • • • Person_Name (Required) Address_Part1 (Required) Telephone_Number (Required) (Score will be based on best of Address_Part_1 and Telephone_Number) • Address_Part2 • Postal_Area • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

Configuring the Match Process 551

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Wide_Household

Description Designed to identify matches where the same address is shared by individuals with the same family name or with the same telephone number. This purpose is typically used after a search by Address_Part1. (Note: it is not practical to search by Person_Name because ultimately only one word from the Person_Name needs to match, and a one-word search will not perform well in most situations). Emphasis is placed on the last name, the major word of the Person_ Name field, so this is one of the few cases where word order is important in the way the records are presented for matching. However, a reasonable score will be generated provided that a match occurs between the major word in one name and any other word in the other name. This purpose uses the following fields: • • • • Address_Part1 (Required) Person_Name (Required) Telephone_Number (Required) Score will be based on best of Person_Name and Telephone_ Number • Address_Part2 • Postal_Area • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

552 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Address

Description Designed to identify an address match. The address might be postal, residential, delivery, descriptive, formal, or informal. The only required field is Address_Part1. The fields Address_Part2, Postal_Area, Telephone_Number, ID, Date, Attribute1 and Attribute2 are available as optional input fields to further differentiate an address. For example if the name of a City and/or State is provided as Address_ Part2, it will help differentiate between a common street address [100 Main Street] in different locations. This purpose uses the following fields: • Address_Part1 (Required) • Address_Part2 • Postal_Area • Telephone_Number • ID • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2. In that case, the Address_Part2 score used will be the higher of the two scored fields.

Configuring the Match Process 553

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Organization

Description Designed to match organizations primarily by name. It is targeted at online searches when a name only lookup is required and a human is available to make the choice. Matching in batch typically requires other attributes in addition to name to make match decisions. Use this purpose only when the rule does not contain address fields. This purpose will allow matches between organizations with an address and those without an address. If the rules contain address fields, use the Division purpose. This purpose uses the following fields: • Organization_Name (Required) • Address_Part1 • Address_Part2 • Postal_Area • Telephone_Number • ID • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. Any optional input fields you provide refine the ranking of matches. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

554 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Division

Description Designed to identify an Organization at an Address. It is typically used after a search by Organization_Name or by Address_Part1, or both. It is in essence the same purpose as Organization, except that Address_ Part1 is a required field. Thus, this Purpose is designed to match company X at an address of Y (or Z, etc., if multiple addresses are supplied). This purpose uses the following fields: • Organization_Name (Required) • Address_Part1 (Required) • Address_Part2 • Postal_Area • Telephone_Number • ID • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

Configuring the Match Process 555

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Contact

Description Designed to identify a contact within an organization at a specific location. This Match purpose is typically used after a search by Person_Name. However, either Organization_Name or Address_Part1 may be used as the search criteria. This purpose uses the following fields: • Person_Name (Required) • Organization_Name (Required) • Address_Part1 (Required) • Address_Part2 • Postal_Area • Telephone_Number • ID • Date • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

556 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)

Match Purpose Corporate_Entity

Description Designed to identify an Organization by its legal corporate name, including the legal endings such as INC, LTD, etc. It is designed for applications that need to honor the differences between such names as ABC TRADING INC and ABC TRADING LTD. This purpose is typically used after a search by Organization_Name. It is in essence the same purpose as Organization, except that tighter matching is performed and legal endings are not treated as noise. This purpose uses the following fields: • Organization_Name (Required) • Address_Part1 • Address_Part2 • Postal_Area • Telephone_Number • ID • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required. To achieve a “best of ” score between Address_Part2 and Postal_Area, pass Postal_Area as a repeat value in the Address_Part2 field.

Wide_Contact

Designed to loosely identify a contact within an organization—that is, without regard to actual location. It is typically used after a search by Person_Name. In addition to the required fields, ID, Attribute1 and Attribute2 may be optionally provided for matching to further qualify a contact. This purpose uses the following fields: • Person_Name (Required) • Organization_name (Required) • ID • Attribute1 • Attribute2 Unless otherwise indicated, fields are not required.

Fields

Provided for general, non-specific use. It is designed in such a way that there are no required fields. All field types are available as optional input fields.

Configuring the Match Process 557

Configuring Match Column Rules for Match Rule Sets

Match Levels

For fuzzy-match base objects, the match level determines how precise the match is. You can specify one of the following match levels for a fuzzy-match base object:
Match Levels

Level Typical Conservative

Description Appropriate for most matches. Produces fewer matches than the Typical level. Some data that actually matches may pass through the match process without being flagged as a match. This situation is called undermatching. Produces more matches than the Typical level. Loose matching may produce a significant number of match candidates that are not really matches. This situation is called overmatching. You might choose to use this in a match rule for manual merges, to make sure that other, tighter match rules have not missed any potential matches.

Loose

Select the level based on your knowledge of the data to be matched: Typical, Conservative (fewer matches), or Looser (more matches). When in doubt, use Typical.

Accept Limit Adjustment

For fuzzy-match base objects, the accept limit is a number that determines the acceptability of a match. This setting does the exact same thing as the match level (see “Match Levels” on page 558), but to a more granular degree. The accept limit is defined by Siperian within a population in accordance with its match purpose. The Accept Limit Adjustment allows a coarse adjustment to what is considered to be a match for this match rule. • A positive adjustment results in more conservative matching. • A negative adjustment results in looser matching.

558 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

For example, suppose that, for a given field and a given population, the accept limit for a typical match level is 80, for a loose match level is 70, and for a conservative match level is 90. If you specify a positive number (such as 3) for the adjustment, then the accept level becomes slightly more conservative. If you specify a negative number (such as -2), then the accept level becomes looser. Configuring this setting provides a optional refinement to your match settings that might be helpful in certain circumstances. Adjusting the accept limit even a few points can have a dramatic effect on your matches, resulting in overmatching or undermatching. Therefore, it is recommended that you test different settings iteratively, with small increments, to determine the best setting for your data.

Match Column Properties for Match Rules
This section describes the match column properties that you can configure for match rules.

Match Subtype

For base objects containing different types of data, the match subtype option allows you to apply match rules to specific types of data within the same base object. You have the option to enable or disable match subtyping for exact-match columns that have parent/child path components. Match subtype is available only for: • exact-match column types that are based on a non-root Path Component, and • match rules that have a fuzzy match / search strategy

To use match subtyping, for each match rule, specify one or more exact-match column(s) that will serve as the “subtyping” column(s) to use. The subtype indicator can be set for any of the exact-match columns regardless of whether they are used for segment match or not. During the match process, evaluation of the subtype column precedes

Configuring the Match Process 559

Configuring Match Column Rules for Match Rule Sets

evaluation of the other match columns. Use match subtyping judiciously, because it can have a performance impact on the match process. Match Subtype behaves just like a standard parent/child matching scenario with the additional requirement that the match column marked as Match Subtype must be the same across all records being matched. In the following example, the Match Subtype column is Address Type and the match rule consists of Address Line1, City, and State.
Parent ID 3 3 5 5 5 7 7 7 Address Line 1 123 Main 50 John St 123 Main 20 Adelaide St 50 John St 50 John St 20 Adelaide St 90 Yonge St City NYC Toronto Toronto Markham Ottawa Barrie Toronto Toronto State ON NY BC AB ON BC NB ON Address Type Billing Shipping Billing Shipping Billing Billing Shipping Billing

Without Match Subtype, Parent ID 3 would match with 5 and 7. With Match Subtype, however, Parent ID 3 will not match with 5 nor 7 because the matching rows are distributed between different Address Types. Parent ID 5 and 7 will match with each other, however, because the matching rows all fall within the 'Billing' Address Type.

Non-Equal Matching

Note: Non-Equal Matching and Segment Matching are mutually exclusive. If one is selected, then the other cannot be selected. Use non-equal matching in match rules to prevent equal values in a column from matching each other. Non-equal matching applies only to exact-match columns.

560 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

NULL Matching

Note: Null Matching and Segment Matching are mutually exclusive. If one is selected, then the other cannot be selected. Use NULL matching to specify how the match process should behave when null values match other null values. NULL matching applies only to exact-match columns. By default, null matching is disabled, meaning that Siperian Hub treats nulls as unequal values when it searches for matches (a null value will not match with anything). To enable null matching, you must explicitly select a null matching option for the match columns to allow null matching. A match column containing a NULL value is identified as matching based on the following settings:
Property Disabled Description Regardless of the other value, nothing will match (nulls are unequal values). Default setting. A NULL is seen as a placeholder for an unknown value. If both values are NULL, then it is considered a match. If one value is NULL and the other value is not NULL, then it is considered a match.

NULL Matches NULL NULL Matches Non-NULL

Once null matching is configured, Build Match Groups will allow only a single “Null to non NULL” match into any group, thereby reducing the possibility of unwanted transitive matching. For more information, see “Build Match Groups and Transitive Matches” on page 327. Note: Null matching is exclusive of exact matching. For example, if you enable NULL Matches Non-Null, the match rule returns only those matches in which one of the cell values is NULL. It will not provide exact matches where both cells are equal in addition to also matching NULL against non-NULL. Therefore, if you need both

Configuring the Match Process 561

Configuring Match Column Rules for Match Rule Sets

behaviors, you must create two exact match rules—one with NULL matching enabled, and the other with NULL matching disabled.

Segment Matching

Note: Segment Matching and Non-Equal Matching are mutually exclusive. If one is selected, then the other cannot be selected. Segment Matching and NULL Matching are also mutually exclusive. If one is selected, then the other cannot be selected. For exact-match columns only, you can use segment matching to limit match rules to specific subsets of data. For example, you could define different match rules for customers in different countries by using segment matching to limit certain rules to specific country codes. Segment matching applies to both exact-match and fuzzy-match base objects. For more information, see “Configuring Segment Matching for a Column” on page 576. If the Segment Matching check box is checked (selected), you can configure two other options: Segment Matches All Data and Segment Match Values. Segment Matches All Data When unchecked (the default), Siperian Hub will only match records within the set of values defined in Segment Match Values. For example, suppose a base object contained Leads, Partners, Customers, and Suppliers. If Segment Match Values contained the values Leads and Partners, and Segment Matches All Data were unchecked, then Siperian would only match within records that contain Leads or Partners. All Customers and Suppliers records will be ignored. With Segment Matches All Data checked (selected), then Leads and Partners would match with Customers and Suppliers, but Customers and Suppliers would not match with each other.

562 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Segment Match Values For segment matching, specifies the list of segment values to use for segment matching. You must specify one or more values (for a match column) that defines the segment matching. For example, for a given match rule, suppose you wanted to define segment matching by Gender. If you specified a segment match value of M (for male), then, for that match rule, Siperian Hub searches for matches (based on the other match columns) only on male records—and can only match to other male records, unless you also enabled Segment Matches All Data. Note: Segment match values are case-sensitive. When using segment matching on fuzzy and exact base objects, the values that you set are case-sensitive when executing the Match batch job. Concatenation of Values in Multiple Columns For exact matches with segment matching enabled on concatenated columns, a space character must be added to each piece of data present in the concatenated fields. Note: Concatenating columns is not recommended for exact match columns.

Requirements for Exact-match Columns in Match Column Rules

Exact-match columns are subject to the following rules: • The names of exact-match columns cannot be longer than 26 characters. • • • Exact-match columns must be of type VARCHAR or CHAR. Match columns can be used to match on any text column or combination of text columns from a base object. If you want to use numerics or dates, you must convert them to VARCHAR using cleanse functions before they are loaded into your base object. For more information, see “Using Cleanse Functions” on page 414.

Configuring the Match Process 563

Configuring Match Column Rules for Match Rule Sets

•

Match columns can also be used to match on a column from a child base object, which in turn can be based on any text column or combination of text columns in the child base object. Matching on the match columns of a child base object is called intertable matching. When using intertable match and creating match rules for the child table (via a foreign key), you must include the foreign key from the parent table in each match rule on the child. If you do not, when the child is merged, the parent records would lose the child records that had previously belonged to them.

•

For more information, see “Match Columns Depend on the Search Strategy” on page 515.

Command Buttons for Configuring Column Match Rules
In the Match Rule Sets tab, if you select a match rule set in the list, the Schema Manager displays the following command buttons.
Button Description Adds a match rule. For more information, see “Adding Match Column Rules” on page 565. Edits properties for the selected a match rule. For more information, see “Editing Match Column Rules” on page 570. Deletes the selected match rule. For more information, see “Deleting Match Column Rules” on page 572. Moves the selected match rule up in the sequence. For more information, see “Changing the Execution Sequence of Match Column Rules” on page 573. Moves the selected match rule down in the sequence. For more information, see “Changing the Execution Sequence of Match Column Rules” on page 573. Changes a manual consolidation rule to an automatic consolidation rule. Select a manual consolidation record and then click the button. For more information, see “Specifying Consolidation Options for Match Column Rules” on page 574. Changes an automatic consolidation rule to a manual consolidation rule. Select an automatic consolidation record and then click the button. For more information, see “Specifying Consolidation Options for Match Column Rules” on page 574.

564 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Important: If you change your match rules after matching, you are prompted to reset your matches. When you reset your matches, it deletes everything in the match table and, in records where the consolidation indicator is 2, resets the consolidation indicator to 4. For more information, see “About the Consolidate Process” on page 335 and “Reset Match Table Jobs” on page 744.

Adding Match Column Rules
To add a new match rule using match columns: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Match Rule Sets tab. For more information, see “Navigating to the Match Rule Set Tab” on page 537. Select a match rule set in the list.

3.

4.

Configuring the Match Process 565

Configuring Match Column Rules for Match Rule Sets

The Schema Manager displays the properties for the selected match rule set.

For fuzzy-match base objects, configure the match rule properties at the top of the dialog box. For more information, see “Match Rule Properties for Fuzzy-match Base Objects Only” on page 544. Configure the match column(s) for this match rule. Only columns you have previously defined as match columns are shown. • • For exact-match base objects or match rules with an exact match / search strategy, only exact column types are available. For fuzzy-match base objects, you can choose fuzzy or exact column types. Click the Edit button next to the Match Columns list.

7.

To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a.

568 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

The Schema Manager displays the Add/Remove Match Columns dialog.

b. c. d.

Check (select) the check box next to any column that you want to include. Uncheck (clear) the check box next to any column that you want to omit. Click OK.

The Schema Manager displays the selected columns in the Match Columns list.

8.

Configure the match properties for each match column in the Match Columns list. For more information, see: • • • • “Match Column Properties for Match Rules” on page 559 “Configuring the Match Weight of a Column” on page 575 “Configuring Segment Matching for a Column” on page 576 “NULL Matching” on page 561

Configuring the Match Process 569

Configuring Match Column Rules for Match Rule Sets

•
9. 10.

“Match Subtype” on page 559

Click OK. If this is an exact match, specify the match properties for this match rule. For more information, see “Requirements for Exact-match Columns in Match Column Rules” on page 563. Click OK. Click the Save button to save your changes. Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.

11.

12.

If you are prompted to confirm saving changes, click OK button to save your changes.

Editing Match Column Rules
To edit the properties for an existing match rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the exact-match base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Match Rule Sets tab. For more information, see “Navigating to the Match Rule Set Tab” on page 537. Select a match rule set in the list. The Schema Manager displays the properties for the selected match rule set. In the Match Rules section of the screen, click the Edit button.

For fuzzy-match base objects, change the match rule properties at the top of the dialog box, if you want. For more information, see “Match Rule Properties for Fuzzy-match Base Objects Only” on page 544.

570 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

7.

Configure the match column(s) for this match rule, if you want. Only columns you have previously defined as match columns are shown. • • For exact-match base objects or match rules with an exact match / search strategy, only exact column types are available. For fuzzy-match base objects, you can choose fuzzy or exact columns types. Click the Edit button next to the Match Columns list.

To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a.

The Schema Manager displays the Add/Remove Match Columns dialog.

b. c. d.

Check (select) the check box next to any column that you want to include. Uncheck (clear) the check box next to any column that you want to omit. Click OK.

The Schema Manager displays the selected columns in the Match Columns list.
8.

Change the match properties for any match column that you want to edit. For more information, see: • • • • • “Match Column Properties for Match Rules” on page 559 “Configuring the Match Weight of a Column” on page 575 “Configuring Segment Matching for a Column” on page 576 “NULL Matching” on page 561 “Match Subtype” on page 559

9.

Click OK.

Configuring the Match Process 571

Configuring Match Column Rules for Match Rule Sets

10.

If this is an exact match, specify the match properties for this match rule. For more information, see “Requirements for Exact-match Columns in Match Column Rules” on page 563. Click OK. Click the Save button to save your changes. Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.

11.

12.

If you are prompted to confirm saving changes, click OK button to save your changes.

Deleting Match Column Rules
To delete a match column rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the exact-match base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Match Rule Sets tab. For more information, see “Navigating to the Match Rule Set Tab” on page 537. Select a match rule set in the list. In the Match Rules section, select the match rule that you want to delete. Click the Click Yes. Remove button. The Schema Manager prompts you to confirm deletion.

3.

4. 5. 6.

7.

572 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Changing the Execution Sequence of Match Column Rules
To change the execution sequence of match column rules: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the exact-match base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Match Rule Sets tab. For more information, see “Navigating to the Match Rule Set Tab” on page 537. Select a match rule set in the list. In the Match Rules section, select the match rule that you want to move up or down. Do one of the following: • • Click the sequence. Click the sequence. button to move the selected match rule up in the execution button to move the selected match rule down in the execution Save button to save your changes.

3.

4. 5.

6.

7.

Click the

Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.
8.

If you are prompted to confirm saving changes, click OK button to save your changes.

Configuring the Match Process 573

Configuring Match Column Rules for Match Rule Sets

Specifying Consolidation Options for Match Column Rules
During the match process, a match column rule must determine whether matched records should be queued for manual or automatic consolidation. For more information, see “About the Consolidate Process” on page 335. Note: A base object cannot have more than 200 user-defined columns if it will have match rules that are configured for automatic consolidation. To toggle between manual and automatic consolidation for a match rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the exact-match base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Match Rule Sets tab. For more information, see “Navigating to the Match Rule Set Tab” on page 537. Select a match rule set in the list. In the Match Rules section, select the match rule that you want to configure. Do one of the following: • • Click the button to change a manual consolidation rule to an automatic consolidation rule. Click the button to change an automatic consolidation rule to a manual consolidation rule. Save button to save your changes.

3.

4. 5. 6.

7.

Click the

Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.
8.

If you are prompted to confirm saving changes, click OK button to save your changes.

574 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

Configuring the Match Weight of a Column

For a fuzzy-match column, you can change its match weight in the Edit Match Rule dialog box. For each column, Siperian Hub assigns an internal match weight, which is a number that indicates the importance of this column (relative to other columns in the table) for matching. The match weight varies according to the selected match purpose and population. For example, if the match purpose is Person_Name, then Siperian Hub, when evaluating matches, views a data match in the name column with greater importance than a data match in a different column (such as the address). By adjusting the match weight of a column, you give added weight to, and elevate the significance of, that column (relative to other columns) when Siperian Hub analyzes values for matches. To configure the match weight of a column: In the Edit Match Rule dialog box, select a column in the list. Click the Match Weight Adjustment button.

1. 2.

Configuring the Match Process 575

Configuring Match Column Rules for Match Rule Sets

If adjusted, the name of the selected column shows in a bold font.

3.

Click the

Save button to save your changes.

Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.
4.

If you are prompted to confirm saving changes, click OK button to save your changes.

Configuring Segment Matching for a Column

As described in “Segment Matching” on page 562, segment matching is used with exact-match columns to limit match rules to specific subsets of data. To configure segment matching for an exact-match column

576 Siperian Hub Administrator Guide

Configuring Match Column Rules for Match Rule Sets

1.

In the Edit Match Rule dialog box, select an exact-match column in the Match Columns list. Check (select) the Segment Matching check box to enable this feature. Check (select) the Segment Matches All Data check box, if you want. For more information, see “Segment Matches All Data” on page 562. Specify the segment match values for segment matching. For more information, see “Segment Match Values” on page 563.
a.

2. 3.

4.

Click the

Edit button.

The Schema Manager displays the Edit Values dialog.

b.

Do one of the following: • • To add a value, click , type the value you want to add, and click OK. , and choose Yes when To delete a value, select it in the list, click prompted to confirm deletion. Save button to save your changes.

5. 6.

Click OK. Click the Before saving changes, the Schema Manager analyzes the match rule set and prompts you with a message if the match rule set contains certain incongruences. For more information, see “Rule Set Evaluation” on page 533.

7.

If you are prompted to confirm saving changes, click OK button to save your changes.

About Primary Key Match Rules
Matching on primary keys can be used when two or more different source systems for a base object have identical primary key values. This situation occurs infrequently in source systems, but when it does occur, you can make use of the primary key matching option in Siperian Hub to rapidly match and automatically consolidated records from the source systems that have the matching primary keys. For example, two systems might use the same set of customer IDs. If both systems provide information about customer XYZ123 using identical primary key values, the two systems are certainly referring to the same customer and the records should be automatically consolidated. When you specify a primary key match, you simply specify which source systems that have the same primary key values. You also check the Auto-merge matching records check box to have Siperian Hub automatically consolidate matching records when a Merge or Link batch job is run. To learn more, see “Automerge Jobs” on page 717 and “Autolink Jobs” on page 715.

Adding Primary Key Match Rules
To add a new primary key match rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Primary Key Match Rules tab.

3.

578 Siperian Hub Administrator Guide

Configuring Primary Key Match Rules

The Schema Manager displays the Primary Key Match Rules tab.

The Primary Key Match Rules tab has the following columns.
Column Key Combination Description Two source systems for which this primary match key rule will be used for matching. These source systems must already be defined in Siperian Hub (see “Configuring Source Systems” on page 348), and staging tables for this base object must be associated with these source systems (see “Configuring Staging Tables” on page 364). Specifies whether this primary key match rule results in automatic or manual consolidation. For more information, see “About the Consolidate Process” on page 335.

Auto-Merge

4.

Click the Plus button

to add a primary match key rule.

The Add Primary Key Match Rule dialog is displayed.

Configuring the Match Process 579

Configuring Primary Key Match Rules

5.

Check (select) the check box next to two source systems for which you want to match records based on the primary key. Check (select) the Auto-merge matching records check box if you are certain that records with identical primary keys are matches. You can change your choice for Auto-merge matching records later, if you want. Click OK. The Schema Manager displays the new rule in the Primary Key Rule tab.

6.

7.

8.

Click the Save button

to save your changes.

The Schema Manager asks you whether you want to reset existing matches.

9.

Choose Yes. to delete all matches currently stored in the match table, if you want.

580 Siperian Hub Administrator Guide

Configuring Primary Key Match Rules

Editing Primary Key Match Rules
Once you have defined a primary key match rule, you can change the value of the Auto-merge matching records check box. To edit an existing primary key match rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Primary Key Match Rules tab. The Schema Manager displays the Primary Key Match Rules tab.

3.

4. 5.

Scroll to the primary key match rule that you want to edit. Check or uncheck the Auto-merge matching records check box to enable or disable auto-merging, respectively. Click the Save button to save your changes.

6.

Configuring the Match Process 581

Configuring Primary Key Match Rules

The Schema Manager asks you whether you want to reset existing matches.

7.

Choose Yes to delete all matches currently stored in the match table, if you want.

Deleting Primary Key Match Rules
To delete an existing primary key match rule: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Primary Key Match Rules tab. The Schema Manager displays the Primary Key Match Rules tab.

3.

4. 5.

Select the primary key match rule that you want to delete. Click the Choose Yes. The Schema Manager removes the deleted rule from the Primary Key Match Rules tab. Delete button. The Schema Manager prompts you to confirm deletion.

6.

582 Siperian Hub Administrator Guide

Investigating the Distribution of Match Keys

7.

Click the Save button

to save your changes.

The Schema Manager asks you whether you want to reset existing matches.

8.

Choose Yes to delete all matches currently stored in your Match table, if you want.

Investigating the Distribution of Match Keys
This section describes how to investigate the distribution of match keys in the match key table.

About Match Keys Distribution
As described in “Match Keys and the Tokenization Process” on page 322, match keys are strings that encode data in the fuzzy match key column used to identify candidates for matching. The tokenization process generates match keys for all the records in a base object and stores them in its match key table. Depending on the nature of the data in the base object record, the tokenization process generates at least one match key—and possibly multiple match keys—for each base object record. Match keys are used subsequently in the match process to help determine possible matches between base object records. In the Match / Merge Setup Details pane of the Schema Manager, the Match Keys Distribution tab allows you to investigate the distribution of match keys in the match key table. This tool can assist you with identifying potential hot spots in your data—high concentrations of match keys that could result in overmatching—where the match process generates too many matches, including matches that are not relevant. By knowing where hot spots occur in your data, you can refine data cleansing and match rules to reduce hot spots and generate an optimal distribution of match keys for use in the match process. Ideally, you want to have a relatively even distribution across all keys.

Configuring the Match Process 583

Investigating the Distribution of Match Keys

Navigating to the Match Keys Distribution Tab
To navigate to the Match Keys Distribution tab: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. For more information, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

The histogram displays the statistical distribution of match keys in the match key table.
Axis Key (X-axis) Description Starting character(s) of the match key. If no filter is applied (the default), this is the starting character of the match key. If a filter is applied, this is the starting sequence of characters in the match key, beginning with the left-most character. For more information, see “Filtering Match Keys” on page 587. Number of match keys in the match key table that begins with the starting character(s). Hotspots in the match key table show up as disproportionately tall spikes (high number of match keys), relative to other characters in the histogram.

Count (Y-axis)

Configuring the Match Process 585

Investigating the Distribution of Match Keys

Match Keys List

67

The Match Keys List on the Match Keys Distribution tab displays records in the match key table. For each record, it displays cell data for the following columns:
Column Name Description ROWID KEY ROWID_OBJECT that uniquely identifies the record in the base object that is associated with this match key. Generated match key. SSA_KEY column in the match key table.

Depending on the configured match rules and the nature of the data in a record, a single record in the base object table can have multiple generated match keys.
Multiple Match Keys for Base Object Record

586 Siperian Hub Administrator Guide

Investigating the Distribution of Match Keys

Paging Through Records in the Match Key Table Use the following command buttons to navigate the records in the match key table.
Button Description Displays the first page of records in the match key table. Displays the previous page of records in the match key table. Displays the next page of records in the match key table. Jumps to the page number you enter.

Match Columns
The Match Columns area on the Match Keys Distribution tab displays match column data for the selected record in the match keys list. This is the SSA_DATA column in the match key table. For each match column that is configured for this base object (see “Configuring Match Columns” on page 515), it displays the column name and cell data.

Filtering Match Keys
You can use a match key filter to focus your investigation on hotspots or other match key distribution patterns. A match key filter restricts the data in the Histogram and the Match Keys List to the subset of match keys that meets the filter condition. By default, no filter is defined—all records in the match key table are displayed. The filter condition specifies the beginning string sequence for qualified match keys, evaluated from left to right. For example, to view only match keys beginning with the letter M, you would select M for the filter. To further restrict match keys and view data for only the match keys that start with the letters MD you would add the letter D to the filter. The longer the filter expression, the more restrictive the display.

Configuring the Match Process 587

Investigating the Distribution of Match Keys

Setting a Filter
To set a filter: • Click the vertical bar in the Histogram associated with the character you want to add to the filter. For example, suppose you started with the following default view in the Histogram.

If you click the vertical bar above the M character, the Histogram refreshes and displays the distribution for all match keys beginning with the character M.

588 Siperian Hub Administrator Guide

Investigating the Distribution of Match Keys

Note that the Match Keys List now displays only those match keys that meet the filter condition.

Navigating Filters
Use the following command buttons to navigate filters.
Button Description Clears the filter. Displays the default view (no filter). Displays the previously-selected filter (removes the right-most character from the filter).

Configuring the Match Process 589

Excluding Records from the Match Process

Excluding Records from the Match Process

Siperian Hub provides a mechanism for selectively excluding records from the match process. You might want to do this if, for example, your data contained records that you wanted the match process to ignore. To configure this feature, in the Schema Manager, you add a column named EXCLUDE_FROM_MATCH to a base object. This column must be an integer type with a default value of zero (0), as described in “Adding Columns” on page 134.

Once the table is populated and before running the Match job, to exclude a record from matching, change its value in the EXCLUDE_FROM_MATCH column to a one (1) in the Data Manager. When the Match job runs, only those records with an EXCLUDE_FROM_MATCH value of zero (0) will be tokenized and processed—all other records will be ignored. When the cell value is changed, the DIRTY_IND for this record is set to 1 so that match keys will be regenerated when the tokenization process is executed, as described in “Match Keys and the Tokenization Process” on page 322.

Before You Begin
Before you begin, you must have installed Siperian Hub, created the Hub Store according to the instructions in Siperian Hub Installation Guide, and built the schema according to the instructions in Chapter 5, “Building the Schema.” To learn about the consolidate process, see “Consolidate Process” on page 335.

About Consolidation Settings
Consolidation settings affect the behavior of the consolidate process in Siperian Hub. This section describes the settings that you can configure on the Merge Settings tab in the Match/Merge Setup Details dialog. To learn more, see “About the Consolidate Process” on page 335.

Immutable Rowid Object
For a given base object, you can designate a source system as an immutable source, which means that records from that source system will be accepted as unique (CONSOLIDATION_IND = 1)—even in the event of a merge. Once a record from that source has been fully consolidated, it will not be changed subsequently, nor will it be matched to any other record (although other records can be matched to it). Only one source system can be configured as an immutable source. Note: If the Requeue on Parent Merge setting for a child base object is set to 2, in the event of a merging parent, the consolidation indicator will be set to 4 for the child record. For more information, see “Requeue On Parent Merge” on page 104. Immutable sources are also distinct systems, as described in “Distinct Source Systems” on page 596. All records are stored in the Siperian Hub as master records. For all source records from an immutable source system, the consolidation indicator for Load and PUT is always 1 (consolidated record). If the Requeue on Parent Merge setting for a child base object is set to 2, then in the event of a merging parent, the consolidation indicator will be set to 4 for the child record. For more information, see “Consolidation Status for Base Object Records” on page 289.

594 Siperian Hub Administrator Guide

About Consolidation Settings

To specify an immutable source for a base object, click the drop-down list next to Immutable Rowid Object and select a source system.

This list displays the source system(s) associated with this base object. Only one source system can be designated an immutable source system. To learn more, see “Configuring Source Systems” on page 348. Immutable source systems are applicable when, for example, Siperian Hub is the only persistent store for the source data. Designating an immutable source system streamlines the load, match, and merge processes by preventing intra-source matches and automatically accepting records from immutable sources as unique. If two immutable records must be merged, then a data steward needs to perform a manual verification in order to allow that change. At that point, Siperian Hub allows the data steward to choose the key that remains.

Distinct Systems
A distinct system provides data that gets inserted into the base object without being consolidated. Records from a distinct system will never match with other records from the same system, but they can be matched to and from other records in other systems (their CONSOLIDATION_IND is set to 4 on load). You can specify distinct source systems and configure whether, for each source system, records are consolidated automatically or manually.

Configuring the Consolidate Process 595

About Consolidation Settings

Distinct Source Systems
You can designate a source system as a distinct source (also known as a golden source), which means that records from that source will not be merged. For example, if the ABC source has been designated as a distinct source, then the match rules will never match (or merge) two records that come from the same source. Records from a distinct source will not match through a transient match in an Auto Match and Merge process (see “Auto Match and Merge Jobs” on page 716). Such records can be merged only manually by flagging them as matches. To designate a distinct source system: From the list of source systems on the Merge Settings tab, select (check) any source system that should not allow intra-system merges to prevent records from merging. For each distinct source system, designate whether you want it to use Auto Rules only (see “Auto Rules Only” on page 597).

1.

2.

The following example shows both options selected for the Billing system.

596 Siperian Hub Administrator Guide

About Consolidation Settings

Auto Rules Only
For distinct systems only, you can enable this option to allow you to configure what types of rules are executed for the associated distinct source system. Check (select) this check box if you want Siperian Hub to apply only the automatic consolidation rules (not the manual consolidation rules) for this distinct system. By default, this option is disabled (unchecked).

Unmerge Child When Parent Unmerges (Cascade Unmerge)
Important: This feature applies only to child base objects with configured match rules and foreign keys. For child base objects, Siperian Hub provides a cascade unmerge feature that allows you to specify what happens if records in the parent base object are unmerged. By default, this feature is disabled, so that unmerging parent records does not unmerge associated child records. In the Unmerge Child When Parent Unmerges portion near the bottom of the Merge Settings tab, if you check (select) the Cascade Unmerge check box for a child base object, when records in the parent object are unmerged, Siperian Hub also unmerges affected records in the child base object.

Prerequisites for Cascade Unmerge
To enable cascade unmerge: • the parent-child relationship must already be configured in the child base object • the foreign key column in the child base object must be a match-enabled column

In the Unmerge Child When Parent Unmerges portion near the bottom of the Merge Settings tab, the Schema Manager displays only those match-enabled columns in the child base object that are configured with a foreign key. To learn more, see “Configuring Foreign-Key Relationships Between Base Objects” on page 140.

Configuring the Consolidate Process 597

Changing Consolidation Settings

Parents with Multiple Children
In situations where a parent base object has multiple child base objects, you can explicitly enable cascade unmerge for each child base object. Once configured, when the parent base object is unmerged, then all affected records in all associated child base objects are unmerged as well.

Considerations for Using Cascade Unmerge
A full unmerge of affected records is not required in all implementations, and it can have a performance overhead on the unmerge because many child records can be affected. In addition, it does not always make sense to enable this property. One example is when Customer is a child of Customer Type. In this situation, you might not want to unmerge Customers if Customer Type is unmerged. However, in most cases, it is a good idea to unmerge addresses linked to customers if Customer unmerges. Note: When cascade unmerge is enabled, the child record may not be unmerged if a previous manual unmerge was done on the child base object. When you enable the unmerge feature, it applies to the child table and the child cross-reference table. Once enabled, if you then unmerge the parent cross-reference, the original child cross-reference should be unmerged as well. This feature has no impact on the parent—the feature operates on the child tables to provide additional flexibility.

Changing Consolidation Settings
To change consolidation settings on the Merge Settings tab: 1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base object that you want to configure. To learn more, see “Navigating to the Match/Merge Setup Details Dialog” on page 486.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Click the Merge Settings tab.

3.

598 Siperian Hub Administrator Guide

Changing Consolidation Settings

The Schema Manager displays the Merge Settings tab for the selected base object.

Before You Begin
Before you begin, you must have completed the following tasks: • Installed Siperian Hub, created the Hub Store, and successfully set up message queues according to the instructions in the Siperian Hub Installation Guide for your platform • Completed the tasks in the Siperian Hub Installation Guide to configure Siperian Hub to handle asynchronous Services Integration Framework (SIF) requests, if applicable Note: SIF uses a message-driven bean (MDB) on the JMS message queue (named to process incoming asynchronous SIF requests. This required queue is set up during the installation process. as described in the Siperian Hub Installation Guide for your platform. If your Siperian Hub implementation does not require any additional message queues, then you can skip this chapter.
siperian.sif.jms.queue)

• •

Built the schema according to the instructions in Chapter 5, “Building the Schema” Read the introduction to the publish process in “Publish Process” on page 342.

Configuration Steps for the Publish Process
After installing Siperian Hub, you use the Message Queues tool in the Hub Console to configure message queues for your Siperian Hub implementation. The following tasks are mandatory if you want to publish events in the outbound message queue: 1. Configure the message queues on your application server. The Siperian installer automatically sets up message queues and the connection factory configuration. For more information, see the Siperian Hub Installation Guide for your platform.
2.

Configure global message queue settings. For more information, see “Configuring Global Message Queue Settings” on page 604. Add at least one message queue server. For more information, see “Configuring Message Queue Servers” on page 605. Add at least one message queue to the message queue server. For more information, see “Configuring Outbound Message Queues” on page 608.

3.

4.

602 Siperian Hub Administrator Guide

Starting the Message Queues Tool

5.

Generate the JMS event message schema for each ORS that has data that you want to publish. For more information, see “Generating and Deploying ORS-specific Schemas” on page 827. Configure message triggers for your message queues. For more information, see “Configuring Message Triggers” on page 612.

6.

After you have configured message queues, you can review run-time activities using the Audit Manager according to the instructions in “Auditing Message Queues” on page 928.

Starting the Message Queues Tool
To start the Message Queues tool: 1. In the Hub Console, connect to the Master Database. Message queues are defined in the Master Database.
2.

In the Hub Console, expand the Configuration workbench, and then click Message Queues. The Hub Console displays the Message Queues tool, as shown here:

Navigation Pane

Properties Pane

Configuring the Publish Process

603

Configuring Global Message Queue Settings

The Message Queues tool is divided into two panes.
Pane Navigation pane Properties pane Description Shows (in a tree view) the message queues that are defined for this Siperian Hub implementation. Shows the properties for the selected message queue.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Specify settings for Data Changes Monitoring, which monitors the queue for outgoing messages. To enable or disable Data Changes Monitoring, click the Toggle Data Changes Monitoring Status button.

3.

4.

Specify the following monitoring settings:
Monitoring Setting Receive Timeout (milliseconds) Receive Batch Size Description Default is 0. Amount of time allowed to receive the messages from the queue. Default is 100. Maximum number of events processed and placed in the message queue in a single pass.

Message Check Default is 300000. Amount of time to pause before polling for Interval (milliseconds) inbound messages or processing outbound messages. The same value applies to both inbound and outbound message queues.

604 Siperian Hub Administrator Guide

Configuring Message Queue Servers

Monitoring Setting Out of sync check interval (milliseconds)

Description If configured, periodically polls for ORS metadata and regenerates the XML message schema if subsequent changes have been made to design objects in the ORS. For more information, see “Generating and Deploying ORS-specific Schemas” on page 827. By default, this feature is disabled—set to zero (0)—and is available only if: • • Data Changes Monitoring is enabled. ORS-specific XML message schema has been generated using the JMS Event Schema Manager. Note: Make sure that this value is greater than or equal to the Message Check Interval.

Click the
5.

button next to any property that you want to change. button to save your changes.

About Message Queue Servers
Before you can define message queues in Siperian Hub, you must define the message queue server(s) that Siperian Hub will use for handling message queues. Before you can define a message queue server in Siperian Hub, it must already be defined on your application server according to the documented instructions for your application server. You will need the connection factory name.

Message Queue Server Properties
This section describes the settings that you can configure for message queue servers.

Configuring the Publish Process

605

Configuring Message Queue Servers

WebLogic and JBoss Properties
You can configure the following message queue server properties.
Property Connection Factory Name Display Name Description Description Name of the connection factory for this message queue server. Name of this message queue server as it will be displayed in the Hub Console. Descriptive information for this message queue server.

WebSphere Properties
IBM WebSphere implementations have the following properties.
Property Server Name Channel Port Description Name of the server where the message queue is defined. Channel of the server where the message queue is defined. Port on the server where the message queue is defined.

Adding Message Queue Servers
To add a message queue server: 1. In the Hub Console, start the Message Queues tool. For more information, see “Starting the Message Queues Tool” on page 603.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Right-click anywhere in the Navigation pane and choose Add Message Queue Server.

3.

606 Siperian Hub Administrator Guide

Configuring Message Queue Servers

The Message Queues tool displays the Add Message Queue Server dialog.

4.

the Message Queues tool displays Specify the properties for this message queue server. For more information, see “Message Queue Server Properties” on page 605.

Editing Message Queue Server Properties
To edit the properties of an existing message queue server: 1. In the Hub Console, start the Message Queues tool. For more information, see “Starting the Message Queues Tool” on page 603.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, select the name of the message queue server that you want to configure. Change the editable properties for this message queue server. For more information, see “Message Queue Server Properties” on page 605. Click the button next to any property that you want to change. button to save your changes. Click the

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, right-click the name of the message queue server that you want to delete, and then choose Delete from the pop-up menu. The Message Queues tool prompts you to confirm deletion. Click Yes.

About Message Queues
Before you can define outbound JMS message queues in Siperian Hub, you must define the message queue server(s) that will service the message queue. For more information, see “Configuring Message Queue Servers” on page 605. In JMS, a message queue is a staging area for XML messages. Siperian Hub publishes XML messages to the message queue. External applications retrieve these published XML messages from the message queue.

Message Queue Properties
You can configure the following message queue properties.
Property Queue Name Description Name of this message queue. This must match the JNDI queue name as configured on your application server.

608 Siperian Hub Administrator Guide

Configuring Outbound Message Queues

Property Display Name Description

Description Name of this message queue as it will be displayed in the Hub Console. Descriptive information for this message queue.

Adding Message Queues to a Message Queue Server
To add a message queue to a message queue server: 1. In the Hub Console, start the Message Queues tool. For more information, see “Starting the Message Queues Tool” on page 603.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, right-lick the name of the message queue server to which you want to add a message queue, and choose Add Message Queue. The Message Queues tool displays the Add Message Queue dialog.

Select one of the following options:
Assignment Leave Unassigned Description Queue is currently unassigned and not in use. Select this option to use this queue as the outbound queue for Siperian Hub API responses, or to indicate that the queue is currently unassigned and is not in use. Queue is currently assigned and is available for use by message triggers that are defined in the Schema Manager according to the instructions in “Configuring Message Triggers” on page 612. Select (check) this option only if your Siperian Hub implementation requires that you use the legacy XML message format (Siperian Hub XU version) instead of the current version of the XML message format. For more information, see “Legacy JMS Message XML Reference” on page 644.

Use with Message Queue Triggers Use Legacy XML

7.

Click the

button to save your changes.

610 Siperian Hub Administrator Guide

Configuring Outbound Message Queues

Editing Message Queue Properties
To edit the properties of an existing message queue: 1. In the Hub Console, start the Message Queues tool. For more information, see “Starting the Message Queues Tool” on page 603.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, select the name of the message queue that you want to configure. Change the editable properties for this message queue. For more information, see “Message Queue Properties” on page 608. Click the button next to any property that you want to change. button to save your changes. Change the queue assignment, if you want. Click the

3.

4.

5. 6.

Deleting Message Queues
To delete an existing message queue: 1. In the Hub Console, start the Message Queues tool. For more information, see “Starting the Message Queues Tool” on page 603.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. In the navigation pane, right-click the name of the message queue that you want to delete, and then choose Delete from the pop-up menu. The Message Queues tool prompts you to confirm deletion. Click Yes.

About Message Triggers
Use message triggers to identify which actions within Siperian Hub are communicated to external applications, and where to publish XML messages. When an action occurs for which a rule is defined, an XML message is placed in a JMS message queue. A message trigger specifies the JMS message queue in which messages are placed. For example: 1. A user inserts a record in a base object.
2. 3.

This insert action initiates a message trigger. Siperian Hub evaluates the message trigger and sends a message to the appropriate message queue. An outside application polls the message queue, picks up the message, and processes it.

4.

You can use the same message queue for all triggers, or you can use a different message queue for each trigger. In order for an action to trigger a message trigger, the message queues must be configured, and a message trigger must be defined for that base object and action.

Types of Events for Message Triggers
The following types of events can cause a message trigger to be fired and a message placed in the queue.
Events for Which Message Queue Rules Can Be Defined

Event Add new data

Description • • • Add the data through the load process Add the data through the Data Manager Add the data through the API verb using PUT or CLEANSE_PUT (either through HTTP, SOAP, MQ, and so on)

612 Siperian Hub Administrator Guide

Configuring Message Triggers

Events for Which Message Queue Rules Can Be Defined (Cont.)

Event Add new pending data Update existing data

Description A new record with a PENDING state is created. Applies to state-enabled base objects only. • • • Update the data through the load process Update the data through the Data Manager Update the data through the API verb using PUT or CLEANSE_ PUT (either through HTTP, SOAP, MQ, and so on) Note: • • If trust rules prevent the base object columns from being updated, no message is generated. If one or more of the specified columns are updated, a single message is generated. This single message includes data from all of the cross-references in all output systems.

Update existing pending data

An existing record with a PENDING state is updated. Applies to state-enabled base objects only. For more information, see Chapter 7, “State Management.” updating data when only the XREF has changed through the load process updating data when only the XREF has changed through the API using PUT or CLEANSE_PUT (either through HTTP, SOAP, MQ, and so on)

Update, only XREF • changed •

Pending update, only XREF changed Merging data

An XREF record with a PENDING state is updated. This includes promotion of a record. Applies to state-enabled base objects only. For more information, see Chapter 7, “State Management.” • • • Manual Merge via Merge Manager Merge via the API Verb (either though HTTP, SOAP, MQ etc.) Automatch and Merge

Merging data, Base object updated Unmerging data

Merging data when the base object has been updated • • Unmerge the data through the Data Manager Unmerge the data through the API verb using UNMERGE (either through HTTP, SOAP, EJB etc.)

Configuring the Publish Process

613

Configuring Message Triggers

Events for Which Message Queue Rules Can Be Defined (Cont.)

Event Accepting data as unique

Description • • • Accepting a single record as unique via the Merge Manager Accepting multiple records as unique via the Merge Manager Having Accept as Unique turned on in the Base Object's Match rules (this happens during the match/merge process) Note: When a record is accepted as unique—either automatically through a match rule or manually by a data steward—Siperian Hub generates a message with the record information, including the cross-reference information for all output systems. This message is placed in the queue. A base object record is soft deleted (state changed to DELETED). Applies to state-enabled base objects only. For more information, see Chapter 7, “State Management.” An XREF record is soft deleted (state changed to DELETED). Applies to state-enabled base objects only. For more information, see Chapter 7, “State Management.”

Delete BO data

Delete XREF data

Delete pending BO A base object record with a PENDING state is hard deleted. Applies to data state-enabled base objects only. For more information, see Chapter 7, “State Management.” Delete pending XREF data No action An XREF record with a PENDING state is hard deleted. Applies to state-enabled base objects only. For more information, see Chapter 7, “State Management.” Applies only to Activity Manager. Returned only by a cleanse_put operation and only if delta detection is enabled. If delta detection is not enabled, then an Update action type is returned.

Considerations for Message Triggers
Consider the following issues when setting up message triggers for your implementation: • If a message queue is used in any message trigger definition under a base object in any Hub Store, the message queue displays the following message: “The message queue is currently in use by message triggers.” In this case, you cannot edit the properties of the message queue. Instead, you must create another message queue to make the necessary changes.

614 Siperian Hub Administrator Guide

Configuring Message Triggers

•

Message triggers apply to one base object only, and they fire only when a specific action occurs directly on that base object. If you have two tables that are in a parent-child relationship, then you need to explicitly define message queues separately, for each table. Change detection is based on specific changes to each base object (such as a load INSERT, load UPDATE, MERGE, or PUT). Changes to a record of the parent table can fire a message trigger for the parent record only. If changes in the parent record affect one or more associated child records, then a message trigger for the child table must be explicitly configured to fire when such an action occurs in the child records. In addition to base objects, message triggers can be configured for dependent and relationship objects. However, only insert and update actions are available for dependent and relationship objects.

•

Adding Message Triggers
To add a message trigger for a base object: 1. Configure the message queue to be usable with message triggers. For more information, see “Editing Message Queue Properties” on page 611.
2.

Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the base object that will be monitored, and select the Message Trigger Setup node.

3.

4.

Configuring the Publish Process

615

Configuring Message Triggers

If no message triggers have been set up, then the Schema Tool displays an empty screen.

5.

Do one of the following: • • If no message triggers have been defined, click Add Message Trigger. OR If message triggers have been defined, then click the button.

616 Siperian Hub Administrator Guide

Configuring Message Triggers

The Schema Manager displays the Add Message Trigger wizard.

6. 7.

Specify a name and description for the new message trigger. Click Next. The Add Message Trigger wizard prompts you to specify the messaging package.

8.

Select the package that will be used to build the message. For more information, see “Configuring Packages” on page 196.

Select the message queue to which the message will be written. Click Next. The Add Message Trigger wizard prompts you to specify the rules for this message trigger.

618 Siperian Hub Administrator Guide

Configuring Message Triggers

12.

Select the event type(s) for this message trigger.

For more information, see “Types of Events for Message Triggers” on page 612.
13.

Configure the system properties for this message trigger:
Check Box Triggering In Message Description System(s) that will trigger the action. For each message that is placed on a message queue due to the trigger, the message includes the pkey_src_object value for each cross-reference that it has in one of the 'In Message' systems.

Note: You must select at least one Triggering system and one In Message system. For example, suppose your implementation had three source systems (A, B, and C) and a base object record had cross-reference records for A and B. Suppose the cross-reference in system A for this base object record were updated. The following table shows possible message trigger configurations and the resulting message:
In Message Systems Resulting Message A B Message with cross-reference for system A Message with cross-reference for system B

Configuring the Publish Process

619

Configuring Message Triggers

In Message Systems Resulting Message C A&B A&C B&C A&B&C
14.

No message – no cross-references from In Message Message with cross-reference for systems A and B Message with cross-reference for system A Message with cross-reference for system B Message with cross-reference for systems A and B

Identify the system to which the event applies, columns to listen to for changes, and the package used to construct the message. All events send the base object record—and all corresponding cross-references that make up that record—to the message, based on the specified package.

15. 16.

Click Next if you have selected an Update option. Otherwise click Finish. If you have clicked the Update action, the Schema Manager prompts you to select the columns to monitor for update actions.

17.

Do one of the following: • Select the column(s) to monitor for the events associated with this message trigger, or

620 Siperian Hub Administrator Guide

Configuring Message Triggers

•
18.

Select the Trigger message if change on any column check box to monitor all columns for updates.

Click Finish.

Editing Message Triggers
To edit the properties of an existing message trigger: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the base object that will be monitored, and select the Message Trigger Setup node. In the Message Triggers list, click the message trigger that you want to configure. The Schema Manager displays the settings for the selected message trigger.

3.

4.

5.

Change the settings you want. For more information, see “Adding Message Triggers” on page 615 and “Types of Events for Message Triggers” on page 612.

Configuring the Publish Process

621

JMS Message XML Reference

Click the
6.

button next to editable property that you want to change. button to save your changes.

Click the

Deleting Message Triggers
To delete an existing message trigger: 1. Start the Schema Manager according to the instructions in“Starting the Schema Manager” on page 90.
2.

Acquire a write lock according to the instructions in “Acquiring a Write Lock” on page 30. Expand the base object that will be monitored, and select the Message Trigger Setup node. In the Message Triggers list, click the message trigger that you want to delete. Click the Click Yes. button. The Schema Manager prompts you to confirm deletion.

3.

4. 5.

6.

JMS Message XML Reference
This section describes the structure of Siperian Hub XML messages and provides example messages. Note: If your Siperian Hub implementation requires that you use the legacy XML message format (Siperian Hub XU version) instead of the current version of the XML message format (described in this section), see “Legacy JMS Message XML Reference” on page 644 instead.

Generating ORS-specific XML Message Schemas
As described in “ORS-specific XML Message Schemas” on page 344, to create XML messages, the publish process relies on an ORS-specific schema file (<ors-name>-siperian-mrm-event.xsd) that you generate using the JMS Event

622 Siperian Hub Administrator Guide

JMS Message XML Reference

Schema Manager tool in the Hub Console. For more information, see “Generating and Deploying ORS-specific Schemas” on page 827.

Elements in an XML Message
The following table describes the elements in an XML message.
Field Root Node
<siperianEvent>

UID of the base object affected by this action. UID of the package associated with this action. Date/time when this message was generated. ID of the Operational Record Store (ORS) associated with this event. UID of the rule that triggered the event that generated this message. Root node for event details. Name of the source system associated with this event.

Description Value of the PKEY_SRC_OBJECT associated with this event. Date/time when the event was generated. RowID of the base object record that was affected by the event. Root node of a cross-reference record affected by this event. System name of the cross-reference record affected by this event. PKEY_SRC_OBJECT of the cross-reference record affected by this event. Name of the secure package associated with this event. Each column in the package is represented by an element in the XML file. Examples: rowidObject and consolidationInd. Defined in the ORS-specific XSD that is generated using the JMS Event Schema Manager tool. For more information, see “Generating and Deploying ORS-specific Schemas” on page 827. List of ROWID_OBJECT values for the losing records in the merge. This field is included in messages for Merge events only. Applies only to an insert in or update of the relationship of dependent objects.

<mergedRowid> <dependentSourceKey>

624 Siperian Hub Administrator Guide

JMS Message XML Reference

Filtering Messages
You can use the custom JMS header named MessageType to filter incoming messages based on the message type. The following message types are indicated in the message header.
Message Type
siperianEvent <serviceNameReturn>

Description Event notification message. For Services Integration Framework (SIF) responses, the response begins with the name of the SIF request, as in the following fragment of a response to a get request:
<getReturn> <message>The GET was executed successfully - retrieved 1 records</message> <recordKey> <ROWID>2</ROWID> </recordKey> ...

Your messages will not look exactly like this. The data will reflect your data, and the fields will reflect your packages.

Configuring the Publish Process

643

Legacy JMS Message XML Reference

Legacy JMS Message XML Reference
This section describes the structure of legacy Siperian Hub XML messages and provides example messages. This section applies only if you have selected the Use Legacy XML check box in the Message Queues tool (see “Configuring Outbound Message Queues” on page 608). Use this option only when your Siperian Hub implementation requires that you use the legacy XML message format (Siperian Hub XU version) instead of the current version of the XML message format (described in “JMS Message XML Reference” on page 622).

Message Fields for Legacy XML
The contents of the data area of the message are determined by the package specified in the trigger. The data area can contain the following fields:
Message Fields

Description Action type: Insert, Update, Update XREF, Accept as Unique, Merge, Unmerge, or Merge Update. Time when the event was generated. Name of the base object table or cross-reference object table affected by this action. Name of the rule that triggered the event that generated this message. ID of the rule that triggered the event that generated this message. Unique key for the base object affected by this action. List of ROWID_OBJECT values for the losing records in the merge. This field is included in messages for MERGE events only. The SYSTEM and PKEY_SRC_OBJECT values for the cross-reference that triggered the UPDATE event. This field is included in messages for UPDATE events only. List of SYSTEM and PKEY_SRC_OBJECT values for all of the cross-references in the output systems for this base object. Applies only to an insert in or update of the relationship of dependent objects.

XREFS RELATED_PKEY_ SRC_OBJECT

644 Siperian Hub Administrator Guide

Legacy JMS Message XML Reference

Message Fields (Cont.)

Field SRC_RELATED_ PKEY_SRC_OBJECT

Description Applies only to an update of relationship of dependent objects.

Filtering Messages for Legacy XML
You can use the custom JMS header named MessageType to filter incoming messages based on the message type. The following message types are indicated in the message header.
Message Type SIP_EVENT <serviceNameReturn> Description Event notification message. For Services Integration Framework (SIF) responses, the response begins with the name of the SIF request, as in the following fragment of a response to a get request:
<getReturn> <message>The GET was executed successfully retrieved 1 records</message> <recordKey> <ROWID>2</ROWID> </recordKey> ...

17
Using Batch Jobs
This chapter describes how to configure and execute Siperian Hub batch jobs using the Batch Viewer and Batch Group tools in the Hub Console. For more information about creating batch jobs using job execution scripts, see Chapter 18, “Writing Custom Scripts to Execute Batch Jobs.”

Before You Begin
Before you begin working with batch jobs, you must have performed the following prerequisites: • installed Siperian Hub and created the Hub Store according to the instructions in the Siperian Hub Installation Guide for your platform • built the schema; see “About the Schema” on page 82

About Siperian Hub Batch Jobs
In Siperian Hub, a batch job is a program that, when executed, completes a discrete unit of work (a process). For example, the Match job carries out the match process: it generates search keys for a base object, searches through the data for match candidates (records that are possible matches), applies the match rules to the match candidates, generates the matches, and then queues the matches for either automatic or manual consolidation. For merge-style base objects, automatic consolidation is handled by the Automerge job, and manual consolidation is handled by the Manual Merge job.

Ways to Execute Batch Jobs
You can execute batch jobs in the following ways: • Hub Console tools: • • Batch Viewer tool—Execute batch jobs individually. For more information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674. Batch Group tool—Execute batch jobs in a group. The Batch Group tool allows you to configure the execution sequence for batch jobs and to execute batch jobs in parallel. For more information, see “Running Batch Jobs Using the Batch Group Tool” on page 688.

•

Stored procedures—Execute public Siperian Hub processes (batch jobs and batch groups) through stored procedures using any job scheduling software (such as Tivoli, CA Unicenter, and so on). For more information, see “About Executing Siperian Hub Batch Jobs” on page 750 You can also create and run stored procedures using the SIF API (using Java, SOAP, or HTTP/XML). For more information, see the Siperian Services Integration Framework Guide.

Support Tables Used By Batch Jobs
The following graphic shows the various support tables used by Siperian Hub batch jobs:

Using Batch Jobs 669

About Siperian Hub Batch Jobs

Running Batch Jobs in Sequence
Certain batch jobs require that other batch jobs be completed first. For example, the landing tables for a base object must be populated before running any batch jobs. Similarly, before you can run a Match job for a base object, you must run its corresponding Stage and Load jobs. Finally, when a base object has dependencies (for example, it is the child of a parent table, or it has foreign key relationships that point to other base objects), batch jobs must be run first for the tables on which the base object depends. You or your organization should consider the best practice of developing an administration or operations plan that specifies which batch processes and dependencies should be completed before running batch jobs.

Populating Landing Tables Before Running Batch Jobs
One of the tasks Siperian Hub batch jobs perform is to move data from landing tables to the appropriate target location in Siperian Hub. Therefore, before you run Siperian Hub batch jobs, you must first have your source systems or an ETL tool write data into the landing tables. The landing tables are Siperian Hub’s interface for batch loads. You deliver the data to the landing tables, and Siperian Hub batch procedures manipulate the data and copy it to the appropriate location(s). For more information, see the description of the Siperian Hub data management process in the Siperian Hub Overview.

Match Jobs and Subsequent Consolidation Jobs
Batch jobs need to be executed in a certain sequence. For example, a Match job must be run for a base object before running the consolidation process. For merge-style base objects, you can run the Auto Match and Merge job, which executes the Match job and then Automerge job repeatedly, until either all records in the base object have been checked for matches, or until the maximum number of records for manual consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on page 490).

Loading Data from Parent Tables First
The general rule of thumb is that all parent tables (tables that other tables reference) must be loaded first.

670 Siperian Hub Administrator Guide

About Siperian Hub Batch Jobs

Loading Data for Objects With Foreign Key Relationships
If two tables have a foreign key relationship between them, you must load the table that is being referenced gets loaded first, and the table doing the referencing gets loaded second. The following foreign key relationships can exist in Siperian Hub: • from one base object (child with foreign key) to another base object (parent with primary key) • from a dependent object to the base object that owns it

In most cases, you will schedule these jobs to run on a regular basis.

Best Practices for Working With Batch Jobs
While you design and plan your batch jobs, consider the following issues: • Define your schema. The schema is fundamental to all your Siperian Hub tasks. Without a schema, your batch jobs have nothing to do. For more information about defining the schema, see “About the Schema” on page 82 • Define mappings before executing Stage jobs. Mappings define the transformations performed in Stage jobs. If you have no mappings defined, then the Stage job will not perform any transformations in the staging process. For more information about mappings, see “Mapping Columns Between Landing and Staging Tables” on page 380. • Define match rules before executing Match jobs. If you have no match rules, then the Match job will produce no matches. For more information, see “Configuring Primary Key Match Rules” on page 578. • Before running production jobs: • • • Run tests with small data sets. Run tests of your cleanse engine and other components to determine whether each component is working as expected. After testing each of the components separately, test the integrated system in its entirety to determine whether the overall system is working as expected.

Using Batch Jobs 671

About Siperian Hub Batch Jobs

Batch Job Creation
Batch jobs are created in either of two says: • automatically when you configure Hub Store, or • when certain changes occur in your Siperian Hub configuration, such as changes to trust settings for a base object

Batch Jobs That Are Created When Changes Occur
The following batch jobs are created when you make changes to the match and merge setup, set properties, or enable trust settings after initial loads: • Accept Non-Matched Records As Unique • • • • • Key Match Jobs Reset Links Jobs Reset Match Table Jobs Revalidate Jobs (that is, if you enable validation for a column) Synchronize Jobs

Running Batch Jobs Using the Batch Viewer Tool
This section describes how to use the Batch Viewer tool in the Hub Console to run batch jobs individually. To run batch jobs in a group, see “Running Batch Jobs Using the Batch Group Tool” on page 688.

Batch Viewer Tool
The Batch Viewer tool provides a way to execute batch jobs individually and to view the job execution logs. The Batch Viewer is useful for starting the run of a single job, or for running jobs that do not need to run often, such as the Synchronize job that is run after trust settings change. The job execution log shows job completion status with any associated messages, such as success, failure, or warning. The Batch Viewer tool also shows job statistics, if applicable. Note: The Batch Viewer does not provide automated scheduling. For more information about how to create custom scripts to execute batch jobs and batch groups, see “About Executing Siperian Hub Batch Jobs” on page 750

Starting the Batch Viewer Tool
To start the Batch Viewer tool: • In the Hub Console, expand the Utilities workbench, and then click Batch Viewer.

674 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

The Hub Console displays the Batch Viewer tool, as shown in the following example.

Navigation Tree

Properties Pane (Selected Item)

Grouping by Table, Data, or Procedure Type
You can change the top-level view of the navigation tree by right-clicking Group By control at the bottom of the tree. Note that the grayed-out item with the check mark represents the current selection.

Running Batch Jobs Manually
To run a batch job manually: 1. Select the Batch Job to run
2.

Execute the Batch Job

Selecting a Batch Job
To select a batch job to run: 1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on page 674. In the following example, the tree displays a list of batch jobs (the list is grouped by procedure type).

2.

Expand the tree to display the batch job that you want to run, and then click it to select it.

Using Batch Jobs 677

Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a screen for the selected batch job with properties and command buttons.

Batch Job Properties The following batch job properties are read-only.
Field Identity Name Description Identification information for this batch job. Stored in the C_REPOS_TABLE_OBJECT_V table Type code for this batch job. For example, Load jobs have the CMXLD.LOAD_MASTER type code. Stored in the OBJECT_NAME column of the C_REPOS_TABLE_ OBJECT_V table.

678 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

Field Description

Description Description for this batch job in the format: JobName for | from BaseObjectName Examples: • Load from Consumer_Credit_Stg • Match for Address This description is stored in the OBJECT_DESC column of the C_REPOS_TABLE_OBJECT_V table.

Options to Set Before Executing Batch Jobs Certain types of batch jobs have additional fields that you can configure before running the batch job.
Field Only For Description

Re-generate All Match Generate Match Controls the scope of match tokens generation: Tokens Token Jobs tokenizes the entire base object (checked) or tokenizes only those records that are flagged in the BO as requiring re-tokenization (un-checked). For more information, see “Regenerating All Match Tokens” on page 726. Force Update Load Jobs If selected, the Load job forces a refresh and loads records from the staging table to the base object (or dependent object) regardless of whether the records have already been loaded. For more information, see “Forcing Updates in Load Jobs” on page 730.

Using Batch Jobs 679

Running Batch Jobs Using the Batch Viewer Tool

Field Match Set

Only For Match Jobs

Description Enables you to choose which match rule set to use for this match job. To learn more, see “Selecting a Match Rule Set” on page 737.

Command Buttons for Batch Jobs After you have selected a batch job, you can click the following command buttons.
.

Button

Description Executes the selected batch job. Clears the job execution history in the Batch Viewer. To learn more, see “Clearing the Job Execution History” on page 687. Sets the status of the currently-executing batch job to Incomplete. For more information, see “Setting the Job Status to Incomplete” on page 681. Refreshes the status display of the currently-executing batch job. For more information, see “Refreshing the Status” on page 681.

Executing a Batch Job
Important: You must have the application server running for the duration of an executing batch job. To execute a batch job in the Batch Viewer: 1. In the Batch Viewer, select the batch job that you want to run. For more information, see “Selecting a Batch Job” on page 677.
2.

In the right panel, click Execute Batch (or right-click on the job in the left panel and select Execute from the pop-up menu) If the current status of the job is Executing, then the Execute Batch button is disabled. You must wait for the batch job to finish before you can run it again.

680 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

To execute batch jobs in other ways, see “Ways to Execute Batch Jobs” on page 668. Refreshing the Status While a batch job is running, you can click Refresh Status to check if the status has changed.

Setting the Job Status to Incomplete In very rare circumstances, you might want to change the status of a running job by clicking Set Status to Incomplete and execute the job again. Only do this if the batch job has stopped executing (due to an error, such as a server reboot or crash) but Siperian Hub has not detected that the job has stopped due to a job application lock in the metadata. You will know this is a problem if the current status is Executing but the database, application server, and logs show no activity. If this occurs, click this button to clear the job application lock so that you can run the batch job again; otherwise, you will not be able to execute the batch job. Setting the status to Incomplete just updates the status of the batch job—it does not abort the job. Note: This option is available only if your user ID has Siperian Administrator rights.

Using Batch Jobs 681

Running Batch Jobs Using the Batch Viewer Tool

Viewing Job Execution Logs
Siperian Hub creates a job execution log each time that it executes a batch job.

Job Execution Status
Each job execution log entry has one of the following status values:
Icon Description Batch job is currently running. Batch job completed successfully. Batch job completed successfully, but additional information is available. For example, for Stage and Load jobs, this can indicate that some records were rejected (see “Viewing Rejected Records” on page 685). For Match jobs, this can indicate that the base object is empty or that there are no more records to match. Batch job failed. For more information, see “Handling the Failed Execution of a Batch Job” on page 686. Batch job status was manually changed from “Executing” to “Incomplete.” For more information, see “Setting the Job Status to Incomplete” on page 681.

Viewing the Job Execution Log for a Batch Job
To view the job execution log for a batch job: 1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on page 674.
2.

Expand the tree to display the job execution log that you want to view, and then click it.

682 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a screen for the selected job execution log.

Job Execution Log Entry Properties
For each job execution log entry, the Batch Viewer displays the following information:
Field Identity Name Description Description Identification information for this batch job. Stored in the C_REPOS_TABLE_OBJECT_V table Name of this job execution log. Date / time when the batch job started. Description for this batch job in the format: JobName for / from BaseObjectName Examples: • • Load from Consumer_Credit_Stg Match for Address

Using Batch Jobs 683

Running Batch Jobs Using the Batch Viewer Tool

Field Source system

Description One of the following: • • source system of the processed data Admin

Timestamp for this batch job Date / time when this batch job started. Date / time when this batch job ended. Elapsed time for the execution of this batch job.

684 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

Viewing Rejected Records
For Stage jobs or Load jobs only, if the batch job resulted in records being written to the rejects table, then the job execution log displays a View Rejects button.

Note: Records are rejected if the HUB_STATE_IND value is not valid. To view the rejected records and the reason why each was rejected: 1. Click the View Rejects button.

Using Batch Jobs 685

Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a table of rejected records.

2.

Click Close.

Handling the Failed Execution of a Batch Job
If executing a batch job failed, perform the following steps: • Display the execution log entry for this batch job. • • Read the error text in the Current Status field for diagnostic information. Take corrective action as necessary.

Copying the Current Status to the Windows Clipboard
To copy the current status of a batch to the Windows Clipboard (to paste into a document or e-mail, for example): • Click the button.

686 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Viewer Tool

Deleting Job Execution Log Entries
To delete the selected job execution log: • Click the button in the top right hand corner of the job properties page.

Clearing the Job Execution History
After running batch jobs over time, the list of executed jobs can become very large. You should periodically remove the extraneous job execution logs from this list. Note: The actual procedure steps to clear job history will be slightly different depending on the view (By Table, By Date, or By Procedure Type); the following procedure assumes you are using the By Table view. To clear the job history: 1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on page 674.
2. 3. 4.

In the Batch Viewer, expand the tree underneath your base object. Expand the tree under the type of batch job. Select the job for which you want to clear the history. The top of the properties screen looks like the following example.

5. 6.

Click Clear History. Click Yes to confirm that you want to delete all the execution history for this batch job.

Using Batch Jobs 687

Running Batch Jobs Using the Batch Group Tool

Running Batch Jobs Using the Batch Group Tool
This section describes how to use the Batch Group tool in the Hub Console to run batch jobs in groups. To run batch jobs individually, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674. The Batch Viewer does not provide automated scheduling. For more information about how to create custom scripts to execute batch jobs and batch groups, see Chapter 18, “Writing Custom Scripts to Execute Batch Jobs.”

About Batch Groups
A batch group is a collection of individual batch jobs (for example, Stage, Load, and Match jobs) that can be executed with a single command. Each batch job in a batch group can be executed sequentially or in parallel with other jobs. You use the Batch Group tool to configure and run batch groups. For more information about batch jobs, see “Batch Jobs Reference” on page 713. For more information about developing custom batch jobs and batch groups that can be made available in the Batch Group tool, see “Developing Custom Stored Procedures for Batch Jobs” on page 806. Note: If you delete an object from the Hub Console (for example, if you delete a mapping), the Batch Group tool highlights any batch jobs that depend on that object (for example, a stage job) in red. You must resolve this issue prior to re-executing the batch group.

Sequential and Parallel Execution
Batch jobs can be executed in the following ways:
Execution Approach Description sequentially parallel Only one batch job in the batch group is executed at one time. Multiple batch jobs in the batch group are executed concurrently and in parallel.

688 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

Execution Paths
An execution path is the sequence in which batch jobs are executed when the entire batch group is executed. The execution path begins with the Start node and ends with the End node. The Batch Group tool does not validate the execution sequence for you—it is up to you to ensure that the execution sequence is correct. For example, the Batch Group tool would not notify you of an error if you incorrectly specified the Load job for a base object ahead of its Stage job, or if you specified the Load job for a dependent object ahead of the Load job for the base object on which it depends.

Levels
In a batch group, the execution path consists of a series of one or more levels that are executed in sequence (see “Running Batch Jobs in Sequence” on page 670).
Start Node Batch Job Levels

End Node

A level is a collection of one or more batch jobs. • If a level contains multiple batch jobs, then these batch jobs are executed in parallel. • If a level contains only a single batch job, then this batch job is executed singly.

All batch jobs in the level must complete before the batch group proceeds to the next task in the sequence.

Using Batch Jobs 689

Running Batch Jobs Using the Batch Group Tool

Note: Because all of the batch jobs in a level are executed in parallel, none of the batch jobs in the same level should have any dependencies. For example, the Stage and Load jobs for a base object should be in separate levels that are executed in the proper sequence. For more information, see “Running Batch Jobs in Sequence” on page 670.

Other Ways to Execute Batch Groups
In addition to using the Batch Group tool, you can execute batch groups in the following ways: • Services Integration Framework (SIF) requests—Applications can invoke the SIF ExecuteBatchGroupRequest request to execute batch groups directly. For more information, see the Siperian Services Integration Framework Guide. • Stored procedures—Execute batch groups through stored procedures using any job scheduling software (such as Tivoli, CA Unicenter, and so on). For more information, see “Executing Batch Groups Using Stored Procedures” on page 798.

Starting the Batch Group Tool
To start the Batch Group tool: • In the Hub Console, expand the Utilities workbench, and then click Batch Group. The Hub Console displays the Batch Group tool:

Navigation Tree

Properties Pane (Selected Item)

690 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool consist of the following areas:
Area Properties Pane Description Properties and command

Navigation Tree Hierarchical list of batch groups and execution logs.

Configuring Batch Groups
This section describes how to add, edit, and delete batch groups. For more information, see “About Batch Groups” on page 688.

Adding Batch Groups
To add a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. Right-click the Batch Groups node in the Batch Group tree and choose Add Batch Group from the pop-up menu.

3.

Using Batch Jobs 691

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool adds a “New Batch Group” to the Batch Group tree.
Batch Group Properties

Execution Sequence (Start / Finish Nodes)

Note the empty execution sequence. You will configure this after adding the new batch group. For more information, see “Configuring Levels for Batch Groups” on page 694.
4.

Specify the following information:
Field Name Description Description Specify a unique, descriptive name for this batch group. Enter a description for this batch group.

5.

Click the

button to save your changes.

The Batch Group tool saves your changes and updates the navigation tree. To add batch jobs to the new batch group, see “Assigning Batch Jobs to Batch Group Levels” on page 698.

692 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

Editing Batch Group Properties
To edit batch group properties: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to edit. Specify a different batch group name, if you want. Specify a different description, if you want. Click the button to save your changes.

3.

4. 5. 6.

Deleting Batch Groups
To delete a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to delete. Right-click the batch group that you want to delete, and then click Delete Batch Group. The Batch Group tool prompts you to confirm deletion. Click Yes. The Batch Group tool removes the deleted batch group from the navigation tree.

3.

4.

5.

Using Batch Jobs 693

Running Batch Jobs Using the Batch Group Tool

Configuring Levels for Batch Groups
As described in “About Batch Groups” on page 688, a batch group contains one or more levels that are executed in sequence. This section describes how to specify the execution sequence by configuring the levels in a batch group. Adding Levels to a Batch Group To add a level to a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch groups tree, right click on any level, and choose one of the following options:
Command Add Level Above Add Level Below Move Level Up Move Level Down Remove this Level Description Add a level to this batch group above the selected item. Add a level to this batch group below the selected item. Move this batch group level above the prior level. Move this batch group level below the next level. Remove this batch group level.

3.

4.

694 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.

5.

Expand the base object(s) for the job(s) that you want to add.

6.

Select the job(s) that you want to add.

Using Batch Jobs 695

Running Batch Jobs Using the Batch Group Tool

To select jobs that you want to execute in parallel, hold down the CTRL key and click each job that you want to select.
7.

Click OK. The Batch Group tool adds the selected job(s) to the batch group.

8.

Click the

button to save your changes.

Removing Levels From a Batch Group To remove a level from a batch group: Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch group, right click on the level that you want to delete, and choose Remove this Level.

1.

2.

3.

4.

696 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

Siperian Hub displays the delete confirmation dialog.

5.

Click Yes. The Batch Group tool removes the deleted level from the batch group.

To Move a Level Up Within a Batch Group To move a level up within a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch groups tree, right click on the level you want to move up, and choose Move Level Up. The Batch Group tool moves the level up within the batch group.

3.

4.

To Move a Level Down Within a Batch Group To move a level down within a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure.

3.

Using Batch Jobs 697

Running Batch Jobs Using the Batch Group Tool

4.

In the batch groups tree, right click on the level you want to move down, and choose Move Level Down. The Batch Group tool moves the level down within the batch group.

Assigning Batch Jobs to Batch Group Levels
In the Batch Group tool, a job is a Siperian Hub batch job. Each level contains one or more batch jobs. If a level contains multiple batch jobs, then all of those batch jobs are executed in parallel. Adding a Batch Job to a Batch Group Level To add a batch job to a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch groups tree, right click on the level to which you want to add jobs, and choose Add jobs to this level.... The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.

3.

4.

698 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

5.

Expand the base object(s) for the job(s) that you want to add.

6.

Select the job(s) that you want to add. To select multiple jobs at once (to execute them in parallel), hold down the CTRL key while clicking jobs.

7. 8.

Click OK. Save your changes. The Batch Group tool adds the selected jobs to the target level box. Siperian Hub executes all batch jobs in a group level in parallel.

Configuring Options for Batch Jobs When configuring a batch group, you can configure job options for certain kinds of batch jobs. For more information about these job options, see “Options to Set Before Executing Batch Jobs” on page 679.

Using Batch Jobs 699

Running Batch Jobs Using the Batch Group Tool

Removing a Batch Job From a Level To remove a batch job from a level: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch group, right click on the job that you want to delete, and choose Remove Job. The Batch Group tool displays the delete confirmation dialog.

3.

4.

5.

Click Yes to delete the selected job. The Batch Group tool removes the deleted job from this level in the batch group.

To Move a Batch Job Up a Level To move a batch job up a level: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch group, right click on the job that you want to move up, and choose Move job up.

3.

4.

700 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool moves the selected job up one level in the batch group. To Move a Batch Job Down a Level To move a batch job down a level: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to configure. In the batch group, right click on the job that you want to move up, and choose Move job down. The Batch Group tool moves the selected job down one level in the batch group.

Executing Batch Groups Using the Batch Group Tool
This section describes how to manage batch group execution in the Batch Group tool. For more information about executing batch jobs in other ways, such as, using stored procedures or the Siperian Services Integration Framework, see “Ways to Execute Batch Jobs” on page 668. Important: You must have the application server running for the duration of an executing batch group. Note: If you delete an object from the Hub Console (for example, if you delete a mapping), the Batch Group tool highlights any batch jobs that depend on that object (for example, a stage job) in red. You must resolve this issue prior to re-executing the batch group.

Using Batch Jobs 701

Running Batch Jobs Using the Batch Group Tool

Navigating to the Control & Logs Screen
The Control & Logs screen is where you can control the execution of a batch group and view its execution logs. To navigate to the Control & Logs screen for a batch group. 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Expand the Batch Group tree to display the batch group that you want to execute.

3.

Expand the batch group and click the Control & Logs node.

702 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays the Control & Logs screen for this batch group.
Toolbar Execution logs for this batch group

Execution logs for individual batch jobs in this batch group

Components of the Control & Logs Screen This screen contains the following components:
Component Toolbar Description Command buttons for managing batch group execution. To learn more, see “Command Buttons for Batch Groups” on page 703. Execution logs for this batch group. Execution logs for individual batch jobs in this batch group.

Logs for the Batch Group Logs for Batch Jobs

Command Buttons for Batch Groups Use the following command buttons to manage batch group execution.
Button Description Executes this batch group. Sets the execution status of a failed batch group to restart. To learn more, see “Restarting a Batch Group That Failed Execution” on page 707.

Using Batch Jobs 703

Running Batch Jobs Using the Batch Group Tool

Button

Description Sets the execution status of a running batch group to incomplete. To learn more, see “Handling Incomplete Batch Group Execution” on page 708. Removes the selected group or job execution log. Removes all group and job execution logs. Refreshes the screen for this batch group.

Executing a Batch Group
To execute a batch group: 1. Navigate to the Control & Logs screen for the batch group.

For more information, see “Navigating to the Control & Logs Screen” on page 702.
2.

Click on the node and then select Batch Group > Execute, or click on the Execute button. The Batch Group tool executes the batch group and updates the logs panel with the status of the batch group execution.

3.

Click the Refresh button to see the execution result.

704 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays progress information.

When finished, the Batch Group tool adds entries to: • • the group execution log for this batch group the job execution log for individual batch jobs

Group Execution Status
Each execution log has one of the following status values:
Icon Description Processing. The batch group is currently running. Batch group execution completed successfully. Batch group execution completed with additional information. For example, for Stage and Load jobs, this can indicate that some records were rejected (see “Viewing Rejected Records” on page 710). For Match jobs, this can indicate that the base object is empty or that there are no more records to match.

Using Batch Jobs 705

Running Batch Jobs Using the Batch Group Tool

Icon

Description Batch group execution failed. For more information, see “Restarting a Batch Group That Failed Execution” on page 707. Batch group execution is incomplete. For more information, see “Handling Incomplete Batch Group Execution” on page 708. Batch group execution has been reset to start over. For more information, see “Restarting a Batch Group That Failed Execution” on page 707.

Viewing the Group Execution Log for a Batch Group
Each time that it executes a batch group, the Batch Group tool generates a group execution log entry. Each log entry has the following properties:
Field Status Description Current status of this batch job. If batch group execution failed, displays a description of the problem. For more information, see “Group Execution Status” on page 705. Date / time when this batch job started. Date / time when this batch job ended. Any messages regarding batch group execution.

Start End Message

Viewing the Job Execution Log for a Batch Job
Each time that it executes a batch job within a batch group, the Batch Group tool generates a job execution log entry.

706 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

Each log entry has the following properties:
Field Job Name Status Start End Message Description Name of this batch job. Current status of this batch job. For more information, see “Job Execution Status” on page 682. Date / time when this batch job started. Date / time when this batch job ended. Any messages regarding batch group execution.

Note: If you want to view the metrics for a completed batch job, you can use the Batch Viewer. For more information, see “Viewing Job Execution Logs” on page 682.

Restarting a Batch Group That Failed Execution
If batch group execution fails, then you can resolve any problems that may have caused the failure to occur, then restart batch group from the beginning. To execute the batch group again: 1. In the Logs for My Batch Group list, select the execution log entry for the batch group that failed.

2.

Click Set to Restart.

Using Batch Jobs 707

Running Batch Jobs Using the Batch Group Tool

The Batch Group tool changes the status of this batch job to Restart.

3.

Resolve any problems that may have caused the failure to occur and execute the batch group again. For more information, see “Executing a Batch Group” on page 704. The Batch Group tool executes the batch group and creates a new execution log entry.

Note: If a batch group fails and you do not click either the Set to Restart button (see “Restarting a Batch Group That Failed Execution” on page 707) or the Set to Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708) in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior failed level.

Handling Incomplete Batch Group Execution
In very rare circumstances, you might want to change the status of a running batch group. • If the batch group status says it is still executing, you can click Set Status to Incomplete and execute the batch group again. You do this only if the batch group has stopped executing (due to an error, such as a server reboot or crash) but Siperian Hub has not detected that the batch group has stopped due to a job application lock in the metadata. You will know this is a problem if the current status is Executing but the database, application server, and logs show no activity. If this occurs, click this button to clear the job application lock so that you can run the batch group again;

708 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

otherwise, you will not be able to execute the batch group. Setting the status to Incomplete just updates the status of the batch group (as well as all batch jobs within the batch group)—it does not terminate processing. Note that, if the job status is Incomplete, you cannot set the job status to Restart. • If the job status is Failed, you can click Set to Restart. Note that, if the job status is Restart, you cannot set the job status to Incomplete.

Changing the status allows you to continue doing something else while the batch group completes. To set the status of a running batch group to incomplete: 1. In the Logs for My Batch Group list, select the execution log entry for the running batch group that you want to mark as incomplete.

2.

Click Set to Incomplete. The Batch Group tool changes the status of this batch job to Incomplete.

3.

Execute the batch group again. For more information, see “Executing a Batch Group” on page 704.

Using Batch Jobs 709

Running Batch Jobs Using the Batch Group Tool

Note: If a batch group fails and you do not click either the Set to Restart button (see “Restarting a Batch Group That Failed Execution” on page 707) or the Set to Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708) in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior failed level.

Viewing Rejected Records
If batch group execution resulted in records being written to the rejects table (during the execution of Stage jobs or Load jobs), then the job execution log enables the View Rejects button. To view rejected records: 1. Click the View Rejects button. The Batch Group tool displays the Rejects window.

2. 3.

Navigate and inspect the rejected records as needed. Click Close.

710 Siperian Hub Administrator Guide

Running Batch Jobs Using the Batch Group Tool

Filtering Execution Logs By Status
You can view history logs across all Batch Groups, based on their execution status by clicking on the appropriate node under the Logs By Status node. To filter execution logs by status: Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690. In the Batch Group tree, expand the Logs by Status node. The Batch Group tool displays the log status list.

1.

2.

3.

Click the particular batch group log entry you want to review in the upper half of the logs panel. Siperian Hub displays the detailed job execution logs for that batch group in the lower half of the panel. For additional information, see: • • • “Group Execution Status” on page 705 “Viewing the Group Execution Log for a Batch Group” on page 706 “Viewing the Job Execution Log for a Batch Job” on page 706

Note: Batch group logs can be deleted by selecting a batch group log and clicking the Clear Selected button. To delete all logs shown in the panel, click the Clear All button.

Using Batch Jobs

711

Running Batch Jobs Using the Batch Group Tool

Deleting Batch Groups
To delete a batch group: 1. Start the Batch Group tool. For more information, see “Starting the Batch Group Tool” on page 690.
2.

Acquire a write lock. For more information, see “Acquiring a Write Lock” on page 30. In the navigation tree, expand the Batch Group node to show the batch group that you want to delete. In the batch group, right click on the job that you want to move up, and choose Delete Batch Group (or select Batch Group > Delete Batch Group).

Alphabetical List of Batch Jobs
Batch Job Accept Non-Matched Records As Unique Autolink Jobs Auto Match and Merge Jobs Automerge Jobs Description For records that have undergone the match process but had no matching data, sets the consolidation indicator to 1 (consolidated), meaning that the record was unique and did not require consolidation. Automatically links records that have qualified for autolinking during the match process and are flagged for autolinking (Automerge_ind=1). Executes a continual cycle of a Match job, followed by an Automerge job, until there are no more records to match, or until the number of matches ready for manual consolidation exceeds the configured threshold. Used with merge-style base objects only. Automatically merges records that have qualified for automerging during the match process and are flagged for automerging (Automerge_ind = 1). Used with merge-style base objects only. Generates a snapshot of the best version of the truth (BVT) for a base object. Used with link-style base objects only. Matches “externally managed/prepared” records with an existing base object, yielding the results based on the current match settings—all without actually modifying the data in the base object. Prepares data for matching by generating match tokens according to the current match settings. Match tokens are strings that encode the columns used to identify candidates for matching. Deletes data from the Hub based on BO / XREF level input. Matches records from two or more sources when these sources use the same primary key. Compares new records to each other and to existing records, and identifies potential matches based on the comparison of source record keys as defined by the match rules. Copies records from a staging table to the corresponding target table in the Hub Store (a base object or dependent object). During the load process, applies the current trust and validation rules to the records. Shows logs for records that have been manually linked in the Merge Manager tool. Used with link-style base objects only.

BVT Snapshot Jobs External Match Jobs

Generate Match Tokens Jobs Hub Delete Jobs Key Match Jobs

Load Jobs

Manual Link Jobs

Using Batch Jobs 713

Batch Jobs Reference

Batch Job Manual Merge Jobs Manual Unlink Jobs

Description Shows logs for records that have been manually merged in the Merge Manager tool. Used with merge-style base objects only. Shows logs for records that have been manually unlinked in the Merge Manager tool. Used with link-style base objects only. Finds duplicate records in the base object, based on the current match rules. Conducts a search to gather match statistics but does not actually perform the match process. If areas of data with the potential for huge match requirements are discovered, Siperian Hub moves the records to a hold status, which allows a data steward to review the data manually before proceeding with the match process. For data with a high percentage of duplicate records, compares new records to each other and to existing records, and identifies exact duplicates. The maximum number of exact duplicates is based on the Duplicate Match Threshold setting for this base object. Used with link-style base objects only. Migrates link-style base objects to merge-style base objects. Allows the merge of multiple records in one job. Reads the PROMOTE_IND column from an XREF table and changes to ACTIVE the state on all rows where the column’s value is 1. Recalculates all base objects identified by ROWID_OBJECT column in the table/inline view if you include the ROWID_OBJECT_TABLE parameter. If you do not include the parameter, this batch job recalculates all records in the BO, in batches of MATCH_BATCH_SIZE or 1/4 the number of the records in the table, whichever is less.

Recalculates the BVT for the specified ROWID_OBJECT. Updates the records in the _LINK table to account for changes in the data. Used with link-style base objects only. Shows logs of the operation where all matched records have been reset to be queued for match. Executes the validation logic/rules for records that have been modified since the initial validation during the Load Process. Copies records from a landing table into a staging table. During execution, cleanses the data according to the current cleanse settings.

714 Siperian Hub Administrator Guide

Batch Jobs Reference

Batch Job Synchronize Jobs

Description Updates metadata for base objects. Used after a base object has been loaded but not yet merged, and subsequent trust configuration changes (such as enabling trust) have been made to columns in that base object. This job must be run before merging data for this base object.

Accept Non-Matched Records As Unique
Accept Non-matched Records As Unique jobs change the status of records that have undergone the match process but had no matching data. This job sets the consolidation indicator to 1, meaning that the record is consolidated or (in this case) did not require consolidation. The Automerge job adheres to this setting and treats these as unique records. The Accept Non-matched Records As Unique job is created: • only if the base object has Accept All Unmatched Rows as Unique enabled (set to Yes) in the Match / Merge Setup configuration. For more information, see “Accept All Unmatched Rows as Unique” on page 492. • only after a merge job is run, as described in “Batch Jobs That Are Created When Changes Occur” on page 673.

Note: This job cannot be executed from the Batch Viewer.

Autolink Jobs
For link-style base objects only, after the Match job has been run, you can run the Autolink job to automatically link any records that qualified for autolinking during the match process.

Using Batch Jobs 715

Batch Jobs Reference

Auto Match and Merge Jobs
Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed by an Automerge job, until there are no more records to match, or until the maximum number of records for manual consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on page 490). The match batch size parameter (see “Number of Rows per Match Job Batch Cycle” on page 491) controls the number of records per cycle that this process goes through to finish the match and merge cycles. To learn more, see “Match Jobs” on page 734 and “Automerge Jobs” on page 717. Important: Do not run an Auto Match and Merge job on a base object that is used to define relationships between records in inter-table or intra-table match paths. Doing so will change the relationship data, resulting in the loss of the associations between records. For more information, see “Relationship Base Objects” on page 498.

Second Jobs Shown After Application Server Restart
If you execute an Auto Match and Merge job, it completes successfully with one job shown in the status. However, if you stop and restart the application server and return to the Batch Viewer, you see a second job (listed under Match jobs) with a warning a few seconds later. The second job is to ensure that either the base object is empty or there are no more records to match.

Auto Match and Merge Metrics
After running an Auto Match and Merge job, the Batch Viewer displays the following metrics (if applicable) in the job execution log:

716 Siperian Hub Administrator Guide

Batch Jobs Reference

The following table describes these metrics.
Metric Matched records Records tokenized Automerged records Accepted as unique records Description Number of records that were matched by the Auto Match and Merge job. Number of records that were tokenized prior to the Auto Match and Merge job. Number of records that were merged by the Auto Match and Merge job. Number of records that were accepted as unique records by the Auto Match and Merge job. For more information, see “Automerge Jobs” on page 717. Applies only if this base object has Accept All Unmatched Rows as Unique enabled (set to Yes) in the Match / Merge Setup configuration. For more information, see “Accept All Unmatched Rows as Unique” on page 492. Queued for automerge Queued for manual merge Number of records that were queued for automerge by a Match job that was executed by the Auto Match and Merge job. For more information, see “Automerge Jobs” on page 717. Number of records that were queued for manual merge. Use the Merge Manager in the Hub Console to process these records. For more information, see the Siperian Hub Data Steward Guide.

Automerge Jobs
For merge-style base objects only, after the Match job has been run, you can run the Automerge job to automatically merge any records that qualified for automerging during the match process. When an Automerge job is run, it processes all matches in the MATCH table that are flagged for automerging (Automerge_ind=1). Note: For state-enabled objects only, records that are PENDING (source and target records) or DELETED are never automerged. When a record is deleted, it is removed from the match table and its consolidation_ind is reset to 4. For more information regarding how to manage the state of base object or XREF records, refer to “Configuring State Management for Base Objects” on page 211.

Using Batch Jobs 717

Batch Jobs Reference

Automerge Jobs and Auto Match and Merge
Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed by an Automerge job, until there are no more records to match, or until the maximum number of records for manual consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on page 490). For additional information, see “Auto Match and Merge Jobs” on page 716.

Automerge Jobs and Trust-Enabled Columns
An Automerge job will fail if there is a large number of trust-enabled columns. The exact number of columns that cause the job to fail is variable and based on the length of the column names and the number of trust-enabled columns. Long column names are at—or close to—the maximum allowable length of 26 characters. To avoid this problem, keep the number of trust-enabled columns below 40 and/or the length of the column names short.

Automerge Metrics
After running an Automerge job, the Batch Viewer displays the following metrics (if applicable) in the job execution log:

The following table describes these metrics.
Metric Automerged records Accepted as unique records Description Number of records that were automerged by the Automerge job. Number of records that were accepted as unique records by the Automerge job. Applies only if this base object has Accept All Unmatched Rows as Unique enabled (set to Yes) in the Match / Merge Setup configuration. For more information, see “Accept All Unmatched Rows as Unique” on page 492.

718 Siperian Hub Administrator Guide

Batch Jobs Reference

BVT Snapshot Jobs
For a base object table, the best version of the truth (BVT) is a record that has been consolidated with the best cells of data from the source records. For more information, see “Best Version of the Truth” on page 340. Note: For state-enabled base objects only, the BVT logic uses the HUB_STATE_IND to ignore the non contributing base objects where the HUB_STATE_IND is -1 or 0 (PENDING or DELETED state). For the online BUILD_BVT call, provide INCLUDE_PENDING_IND parameter. Possible scenarios include: 1. If this parameter is 0 then include only ACTIVE base object records.
2. 3.

If this parameter is 1 then include ACTIVE and PENDING base object records. If this parameter is 2 then calculate based on ACTIVE and PENDING XREF records to provide “what-if ” functionality. If this parameter is 3 then calculate based on ACTIVE XREF records to provide current BVT based on XREFs, which may be different than the scenario 1.

4.

For more information regarding how to manage the state of base object or XREF records, refer to Chapter 7, “State Management.”

External Match Jobs
External match jobs match “externally managed/prepared” records with an existing base object, yielding the results based on the current match settings—all without actually loading the data from the input table into the base object, changing data in the base object in any way, or changing the match table associated with the base object. You can use external matching to pretest data, test match rules, and inspect the results before running the actual Match job. The base object for External Match jobs must be a fuzzy-match base object, as described in “Exact-match and Fuzzy-match Base Objects” on page 320.

Using Batch Jobs 719

Batch Jobs Reference

The External Match job executes as a batch job only—there is no corresponding SIF request that external applications can invoke. For more information, see “Running External Match Jobs” on page 724.

Input and Output Tables Used for External Match Jobs
In addition to the base object and its associated match key table, the External Match job uses the following input and output tables.

External Match Input (EMI) Table Each base object has an External Match Input (EMI) table for External Match jobs. This table uses the following naming pattern: C_BaseObject_EMI where BaseObject is the name of the base object associated with this External Match job.

720 Siperian Hub Administrator Guide

Batch Jobs Reference

When you create a base object, the Schema Manager automatically creates the associated EMI table, and automatically adds the following system columns:
Column Name SOURCE_KEY Data Type VARCHAR Size Not Null Description 50 Used as part of a three-column composite primary key to uniquely identify this record and to map to records in the C_BaseObject_EMO table. Used as part of a three-column composite primary key to uniquely identify this record and to map to records in the C_BaseObject_EMO table. Used as part of a three-column composite primary key to uniquely identify this record and to map to records in the C_BaseObject_EMO table.

SOURCE_NAME

VARCHAR

50

FILE_NAME

VARCHAR

50

When populating the EMI table (see “Populating the Input Table” on page 724), at least one of these columns must contain data. Note that the column names are non-restrictive—they can contain any identifying data, as long as the composite three-column primary key is unique. In addition, when you configure match rules for a particular column (for example, Person_Name, Address_Part1, or Exact_Cust_ID), the Schema Manager adds that column automatically to the C_BaseObject_EMI table.

Using Batch Jobs 721

Batch Jobs Reference

You can view the columns of an external match table in the Schema Manager by expanding the External Match Table node, as shown in the following example.

The records in the EMI table are analogous to the match batch used in Match jobs. As described in “Flagging the Match Batch” on page 329, the match batch contains the set of records that are matched against the rest of records in the base object. The difference is that, for Match jobs, the match batch records reside in the base object, while for External Match, these records reside in a separate input table. External Match Output (EMO) Table Each base object has an External Match Output (EMO) table that contains the output data for External Match jobs. This table uses the following naming pattern: C_BaseObject_EMO where BaseObject is the name of the base object associated with this External Match job. Before the External Match job is executed, Siperian Hub drops and re-creates this table.

722 Siperian Hub Administrator Guide

Batch Jobs Reference

An EMO table contains the following columns:
Column Name SOURCE_KEY Data Type VARCHAR Size Not Null Description 50 Used as part of a three-column composite primary key to uniquely identify this record. Maps back to the source record in the C_BaseObject_EMI table. Used as part of a three-column composite primary key to uniquely identify this record. Maps back to the source record in the C_BaseObject_EMI table. Used as part of a three-column composite primary key to uniquely identify this record. Maps back to the source record in the C_BaseObject_EMI table. X ROWID_OBJECT of the record in the base object that matched the record in the EMI table. Identifies the match rule that was used to determine whether the two rows matched. X Specifies whether a record qualifies for automatic consolidation during the match process. One of the following values: Zero (0): Record does not qualify for automatic consolidation. Record • One (1): Record qualifies for automatic consolidation. The Automerge Autolink job processes any records with an AUTOMERGE_IND of 1. For more information, see “Automerge Jobs” on page 717. CREATOR CREATE_DATE VARCHAR2‘ 50 DATE User or process responsible for creating the record. Date on which the record was created. •

SOURCE_NAME

VARCHAR

50

FILE_NAME

VARCHAR

50

ROWID_OBJECT_MATCHED

CHAR

14

ROWID_MATCH_RULE AUTOMERGE_IND

CHAR NUMBER

14 38

Instead of populating the match table for the base object, the External Match job populates this EMO table with match pairs. Each row in the EMO represents a pair of matched records—one from the EMI table and one from the base object:

Using Batch Jobs 723

Batch Jobs Reference

• •

The primary key (SOURCE_KEY + SOURCE_NAME + FILE_NAME) uniquely identifies the record in the EMI table. ROWID_OBJECT_MATCHED uniquely identifies the record in the base object.

Populating the Input Table
Before running an External Match job, the EMI table must be populated with records to match against the records in the base object. The process of loading data into an EMI table is external to Siperian Hub—you must use a data loading tool that works with your database platform (such as SQL*Loader). Important: When you populate this table, you must supply data for at least one of the system columns (SOURCE_KEY, SOURCE_NAME, and FILE_NAME) to help link back from the _EMI table. In addition, the C_BaseObject_EMI table must contain flat records—like the output of a JOIN, with unique source keys and no foreign keys to other tables.

Running External Match Jobs
To run an external match job for a base object: 1. Populate the data in the C_BaseObject_EMI table using a data loading process that is external to Siperian Hub. For requirements, see “Populating the Input Table” on page 724.
2.

In the Hub Console, start either of the following tools: • • Batch Viewer according to the instructions in “Starting the Batch Viewer Tool” on page 674 Batch Group according to the instructions in “Starting the Batch Group Tool” on page 690

3. 4.

Select the External Match job for the base object. Select the match rule set that you want to use for external match. The default match rule set is automatically selected. For more information, see “Configuring Match Rule Sets” on page 531.

724 Siperian Hub Administrator Guide

Batch Jobs Reference

5.

Execute the External Match job according to the instructions in “Running Batch Jobs Manually” on page 677 or “Executing Batch Groups Using the Batch Group Tool” on page 701. • The External Match job matches all records in the C_BaseObject_EMI table against the records in the base object. There is no concept of a consolidation indicator in the input or output tables. The Build Match Group is not run for the results.

•
6.

Inspect the results in the C_BaseObject_EMO table using a data management tool (external to Siperian Hub). If you want to save the results, make a backup copy of the data before running the External Match job again. Note: The C_BaseObject_EMO table is dropped and recreated after every External Match Job execution.

7.

Generate Match Tokens Jobs
Before you can run the Match job for a given base object, you must first generate the match tokens. The Generate Match Tokens job generates the match tokens for the base object according to the current match settings. If you change a match rule, Siperian Hub might need to regenerate the tokens for the new match criteria, so Siperian Hub automatically creates a Key Match job, as described in “Batch Jobs That Are Created When Changes Occur” on page 673. For more information about configuring match token generation, see “Match Keys and the Tokenization Process” on page 322. Note: For state-enabled base objects only, the tokenize batch process skips records that are in the DELETED state. These records can be tokenized through the Tokenize API, but will be ignored in batch processing. PENDING records can be matched on a per base object basis by setting the MATCH_PENDING_IND (default off). For more information regarding how to manage the state of base object or XREF records, refer to “Configuring State Management for Base Objects” on page 211.

Using Batch Jobs 725

Batch Jobs Reference

Regenerating All Match Tokens
Before you run a Generate Match Tokens job, you can use the Re-generate All Match Tokens check box to specify the scope of match token generation.

Do one of the following: • Check (select) this check box to have the Generate Match Tokens job truncate the match key table and then tokenize the entire base object. • Uncheck (clear) this check box to have the Generate Match Tokens job generate only tokens that are missing from the match key table based on the changed match criteria.

After Generating Match Tokens
After the match tokens are generated, you can run the Match job for a base object.

Hub Delete Jobs
Hub Delete jobs remove data from the Hub based on base object / XREFs input to the cmxdm.hub_delete_batch stored procedure. You can use the Hub Delete job to remove an entire source system from the Hub. Note: Hub Delete jobs execute as a batch only stored procedure—you can not call a Hub Delete job from the Batch Viewer or Batch Group tools, and there is no corresponding SIF request that external applications can invoke. For more information, see “Hub Delete Jobs” on page 769.

726 Siperian Hub Administrator Guide

Batch Jobs Reference

Key Match Jobs
Key Match jobs match records from two or more sources when these sources use the same primary key. Key Match jobs compare new records to each other and to existing records, and then identify potential matches based on the comparison of source record keys as defined by the primary key match rules. A Key Match job is automatically created for a base object after a primary key match rule has been created or changed in the Match / Merge Setup configuration for this base object. For more information, see “Configuring Primary Key Match Rules” on page 578.

Load Jobs
Load jobs move data from a staging table to the corresponding target table (base object or dependent object) in the Hub Store. Load jobs also calculate trust values for base objects with defined trusted columns, and they apply validation rules (if defined) to determine the final trust values. For more information about loading data, including trust, validation, and delta detection, see “Configuration Tasks for Loading Data” on page 454. For state-enabled base objects, the load batch process can load records in any state. The state is specified as an input column on the staging table. The input state can be specified in the mapping view a landing table column or it can be derived. If an input state is not specified in the mapping, then the state is assumed to be ACTIVE. For more information regarding how to manage the state of base object or XREF records, refer to “Configuring State Management for Base Objects” on page 211. The following table describes how input states affect the states of existing XREFs.
Existing XREF State: Incoming XREF State: ACTIVE Update Update + Promote Update + Restore Insert Insert No XREF (Load by No Base Object PENDING DELETED rowid)

Note: Records are rejected if the HUB_STATE_IND value is not valid. The following table provides a matrix of how Siperian Hub processes records (for state-enabled base objects) during Load (and Put) for certain operations based on the record state:
Incoming Record State Update the XREF ACTIVE record when: DELETED PENDING ACTIVE DELETED PENDING DELETED Existing Record State Notes ACTIVE ACTIVE PENDING PENDING DELETED DELETED When a base object rowid delete record comes in, Siperian Hub updates the base object and all XREF records (regardless of ROWID_SYSTEM) to DELETED state. ACTIVE No Record No Record The second record for the pair is created.

Insert the XREF record when:

PENDING ACTIVE PENDING

728 Siperian Hub Administrator Guide

Batch Jobs Reference

Incoming Record State Delete the XREF record when: ACTIVE

Existing Record State Notes PENDING (for paired records) Delete the ACTIVE record in the pair, the PENDING record is then updated. Paired records are two records with the same PKEY_SRC_OBJECT and ROWID_SYSTEM.

DELETED Siperian Hub displays an error when: PENDING

PENDING ACTIVE (for paired records) Paired records are two records with the same PKEY_SRC_ OBJECT and ROWID_ SYSTEM.

Additional notes: • If the incoming state is not specified (for a Load update), then the incoming state is assumed to be the same as the current state. For example if the incoming state is null and the existing state of the XREF or base object to update is PENDING, then the incoming state is assumed to be PENDING instead of null. • Siperian Hub deletes XREF records using the Hub Delete batch job. The Hub Delete batch job removes specified data—up to and including an entire source system—from Siperian Hub based on your base object/XREF input to the cmxdm.hub_delete_batch stored procedure. For more information, see “Hub Delete Jobs” on page 769.

For more information regarding how to manage the state of base object or XREF records, refer to “Configuring State Management for Base Objects” on page 211.

Rules for Running Load Jobs
The following rules apply to Load jobs: • Run a Load job only if the Stage job that loads the staging table used by the Load job has completed successfully. • Run the Load job for a parent table before you run the Load job for a child table.

Using Batch Jobs 729

Batch Jobs Reference

• •

Run the Load job for a parent base object before you run the Load job for a dependent object. If a lookup on the child object is not defined (the lookup table and column were not populated), in order to successfully load data, you must repeat the Stage job on the child object prior to running the Load job. Only one Load job at a time can be run for the same base object or dependent object. Multiple Load jobs for the same base object or dependent object cannot be run concurrently.

•

Forcing Updates in Load Jobs
Before you run a Load job, you can use the Force Update check box to configure how the Load job loads data from the staging table to the target base object or dependent object. By default, Siperian Hub checks the Last Update Date for each record in the staging table to ensure that it has not already loaded the record. To override this behavior, check (select) the Force Update check box, which ignores the Last Update Date, forces a refresh, and loads each record regardless of whether it might have already been loaded from the staging table. Use this approach prudently, however. Depending on the volume of data to load, forcing updates can carry a price in processing time.

Generating Match Tokens During Load Jobs
When configuring the advanced properties of a base object in the Schema tool, you can check (select) the Generate Match Tokens on Load check box to generate match tokens during Load jobs, after the records have been loaded into the base object. By default, this check box is unchecked (cleared), and match tokens are generated during the Match process instead. For more information, see “Editing Base Object Properties” on page 108 and “Run-time Execution Flow of the Load Process” on page 304.

730 Siperian Hub Administrator Guide

Batch Jobs Reference

Load Job Metrics
After running a Load job, the Batch Viewer displays the following metrics (if applicable) in the job execution log:

The following table describes these metrics.
Metric Total records Inserted Updated No action Updated XREF Description Number of records processed by the Load job. Number of records inserted by the Load job into the target object. Number of records updated by the Load job in the target object. Number of records on which no action was taken (the records already existed in the base object). Number of records that updated the cross-reference table for this base object. If you are loading a record during an incremental load, that record has already been consolidated (exists only in the XREF and not in the base object). Number of records tokenized by the Load job. Applies only if the Generate Match Tokens on Load check box is selected in the Schema tool. For more information, see “Generating Match Tokens During Load Jobs” on page 730. Number of source records that were not merged by the Load job.

Records tokenized

Unmerged source records

Using Batch Jobs 731

Batch Jobs Reference

Metric Missing Lookup / Invalid rowid_object records

Description Number of source records that were missing lookup information or had invalid rowid_object records.

Manual Link Jobs
For link-style base objects only, after the Match job has been run, data stewards can use the Merge Manager to process records that have been queued by a Match job for manual linking.

Manual Merge Jobs
After the Match job has been run, data stewards can use the Merge Manager to process records that have been queued by a Match job for manual merge. Manual Merge jobs are run in the Merge Manager—not in the Batch Viewer. The Batch Viewer only allows you to inspect job execution logs for Manual Merge jobs that were run in the Merge Manager.

Maximum Matches for Manual Consolidation
In the Schema Manager, you can configure the maximum number of matches ready for manual consolidation to prevent data stewards from being overwhelmed with thousands of manual merges for processing. Once this limit is reached, the Match jobs and the Auto Match and Merge jobs will not run until the number of matches has been reduced. For more information, see “Maximum Matches for Manual Consolidation” on page 490.

Executing a Manual Merge Job in the Merge Manager
When you start a Manual Merge job, the Merge Manager displays a dialog with a progress indicator. A manual merge can take some time to complete. If problems occur during processing, an error message is displayed on completion. This error also shows up in the job execution log for the Manual Merge job in the Batch Viewer.

732 Siperian Hub Administrator Guide

Batch Jobs Reference

In the Merge Manager, the process dialog includes a button labeled Mark process as incomplete that updates the status of the Manual Merge job but does not abort the Manual Merge job. If you click this button, the merge process continues in the background. At this point, there will be an entry in the Batch Viewer for this process. When the process completes, the success or failure is reported. For more information about the Merge Manager, see the Siperian Hub Data Steward Guide.

Manual Unlink Jobs
For link-style base objects only, after a Manual Link job has been run, data stewards can use the Data Manager to manually unlink records that have been manually linked.

Manual Unmerge Jobs
For merge-style base objects only, after a Manual Merge job has been run, data stewards can use the Data Manager to manually unmerge records that have been manually merged. Manual Unmerge jobs are run in the Data Manager—not in the Batch Viewer. The Batch Viewer only allows you to inspect job execution logs for Manual Unmerge jobs that were run in the Data Manager. For more information about the Data Manager, see the Siperian Hub Data Steward Guide.

Executing a Manual Unmerge Job in the Data Manager
When you start a Manual Unmerge job, the Data Manager displays a dialog with a progress indicator. A manual unmerge can take some time to complete, especially when a record in question is the product of many constituent records If problems occur during processing, an error message is displayed on completion. This error also shows up in the job execution log for the Manual Unmerge in the Batch Viewer. In the Data Manager, the process dialog includes a button labeled Mark process as incomplete that updates the status of the Manual Unmerge job but does not abort the Manual Unmerge job. If you click this button, the unmerge process continues in the background. At this point, there will be an entry in the Batch Viewer for this process. When the process completes, the success or failure is reported.

Using Batch Jobs 733

Batch Jobs Reference

Match Jobs
A match job generates search keys for a base object, searches through the data for match candidates (records that are possible matches), applies the match rules to the match candidates, generates the matches, and then queues the matches for either automatic or manual consolidation. For an introduction, see “Match Process” on page 317. When you create a new base object in an ORS, Siperian Hub automatically creates its Match job. Each Match job compares new or updated records in a base object with all records in the base object. For a detailed description, see “Run-Time Execution Flow of the Match Process” on page 329. After running a Match job, the matched rows are queued for automatic and manual consolidation. Siperian Hub creates jobs that automatically consolidate the appropriate records (automerge or autolink). If a record is flagged for manual consolidation (manual merge or manual link), data stewards must use the Merge Manager to perform the manual consolidation. For more information about manual consolidation, see the Siperian Hub Data Steward Guide. For more information about consolidation, see “About the Consolidate Process” on page 335. You configure Match jobs in the Match / Merge Setup node in the Schema Manager. For more information, see “Configuration Tasks for the Match Process” on page 484. Important: Do not run a Match job on a base object that is used to define relationships between records in inter-table or intra-table match paths. Doing so will change the relationship data, resulting in the loss of the associations between records. For more information, see “Relationship Base Objects” on page 498.

Match Tables
When a Siperian Hub Match job runs for a base object, it populates its match table. Match tables are usually named as Base_Object_MTCH. For more information, see “Populating the Match Table with Match Pairs” on page 330.

734 Siperian Hub Administrator Guide

Batch Jobs Reference

Match Jobs and State-enabled Base Objects
The following table describes the details of the match batch process behavior given the incoming states for state-enabled base objects:
Source Base Object Target Base Object State State Operation Result ACTIVE PENDING ACTIVE ACTIVE The records are analyzed for matching Whether PENDING records are ignored in Batch Match is a table-level parameter. If set, then batch match will include PENDING records for the specified Base Object. But the PENDING records can only be the source record in a match. DELETED records are ignored in Batch Match PENDING records cannot be the target of a match.

DELETED ANY

Any state PENDING

Note: For Build Match Group (BMG), do not build groups with PENDING records. PENDING records to be left as individual matches. PENDING matches will have automerge_ind=2. For more information regarding how to manage the state of base object or XREF records, refer to “Configuring State Management for Base Objects” on page 211.

Auto Match and Merge Jobs
For merge-style base objects only, you can run the Auto Match and Merge job for a base object. Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed by an Automerge job, until there are no more records to match, or until the maximum number of records for manual consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on page 490). For more information, see “Auto Match and Merge Jobs” on page 716.

Using Batch Jobs 735

Batch Jobs Reference

Match Stored Procedure
When executing the MATCH job stored procedure: • CMXMA.MATCH just runs one batch. • the Match job is dependent on the successful completion of all tokenization jobs for the base object and any child tables used in intertable match. For more information about the tokenization job, see “Generate Match Tokens Jobs” on page 725. For more information about tokens for match, see “About the Consolidate Process” on page 335. the Generate Match Tokens job need not be scheduled. Siperian Hub automatically runs it.

•

Setting Limits for Batch Jobs
The Match job for a base object does not attempt to match every record in the base object against every other record in the base object. Instead, you specify (in the Schema tool): • how many records the job should match each time it runs. For more information, see “Number of Rows per Match Job Batch Cycle” on page 491. • how many matches are allowed for manual consolidation. This feature helps to prevent data stewards from being overwhelmed with manual merges for processing. Once this limit is reached, the Match job will not run until the number of matches ready for manual consolidation has been reduced. For more information, see “Maximum Matches for Manual Consolidation” on page 490.

736 Siperian Hub Administrator Guide

Batch Jobs Reference

Selecting a Match Rule Set
For Match jobs, before executing the job, you can select the match rule set that you want to use for evaluating matches.

The default match rule set for this base object is automatically selected. To choose any other match rule set, click the drop-down list and select any other match rule set that has been defined for this base object. For more information, see “Configuring Match Rule Sets” on page 531.

Match Job Metrics
After running a Match job, the Batch Viewer displays the following metrics (if applicable) in the job execution log:

Using Batch Jobs 737

Batch Jobs Reference

The following table describes these metrics.
Metric Matched records Records tokenized Description Number of records that were matched by the Match job. Number of records that were tokenized by the Match job.

Queued for automerge Number of records that were queued for automerge by the Match job. Use the Automerge job to process these records. For more information, see “Automerge Jobs” on page 717. Queued for manual merge Number of records that were queued for manual merge by the Match job. Use the Merge Manager in the Hub Console to process these records. For more information, see the Siperian Hub Data Steward Guide.

Match Analyze Jobs
Match Analyze jobs perform a search to gather metrics but do not conduct any actual matching. If areas of data with the potential for huge match requirements (hot spots) are discovered, Siperian Hub moves these records to an on-hold status to prevent overmatching. Records that are on hold have a consolidation indicator of 9, which allows a data steward to review the data manually in the Data Manager tool before proceeding with the match and consolidation. Match Analyze jobs are typically used to tune match rules or simply to determine whether data for a base object is overly “matchy” or has large intersections of data (“hot spots”) that will result in overmatching.

Dependencies for Match Analyze Jobs
Each Match Analyze job is dependent on new / updated records in the base object that have been tokenized and are thus queued for matching. For base objects that have intertable match enabled, the Match Analyze job is also dependent on the successful completion of the data tokenization jobs for all child tables, which in turn is dependent on successful Load jobs for the child tables.

738 Siperian Hub Administrator Guide

Batch Jobs Reference

Limiting the Number of On-Hold Records
You can limit the number of records that the Match Analyze job moves to the on-hold status. By default, no limit is set. To configure a limit, edit the cmxcleanse.properties file and add the following setting:
cmx.server.match.threshold_to_move_range_to_hold = n

where n is the maximum number of records that the Match Analyze job can move to the on-hold status. For more information about the cmxcleanse.properties file, see the Siperian Hub Installation Guide for your platform.

Match Analyze Job Metrics
After running a Match Analyze job, the Batch Viewer displays the following metrics (if applicable) in the job execution log. Metrics in Execution Log
Metric Records moved to Hold Status Records analyzed (to be matched) Match comparisons required Description Number of records moved to Hold Number of records analyzed for match Number of actual matches that would be required to process this base object

Statistics
Statistic Top 10 range count Top 10 range comparison count Total records moved to hold Total matches moved to hold Total ranges processed Description Top ten number of records in a given search range. Top ten number of match comparison that will need to be performed for a given search range. Count of the records moved to hold. Total number of matches these records moved to hold required. Number of ranges required to process all the matches in base object.

Using Batch Jobs 739

Batch Jobs Reference

Statistic Total candidates Time for analyze

Description Total number of match candidates required to process all matches for this base object. Amount of time required to run the analysis.

Match for Duplicate Data Jobs
Match for Duplicate Data jobs search for exact duplicates to consider them matched. The maximum number of exact duplicates is based on the base object columns defined in the Duplicate Match Threshold property in the Schema Manager for each base object. For more information, see “Duplicate Match Threshold” on page 103. For more information, see also “Matching for Duplicate Data” on page 326. Note: The Match for Duplicate Data job does not display in the Batch Viewer when the duplicate match threshold is set to 1 and non-equal matches are enabled on the base object. To match for duplicate data: 1. Execute the Match for Duplicate Data job right after the Load job is finished.
2.

Once the Match for Duplicate Data job is complete, run the Automerge job to process the duplicates found by the Match for Duplicate Data job. Once the Automerge job is complete, run the regular match and merge process (Match job and then Automerge job, or the Auto Match and Merge job).

Multi Merge Jobs
A Multi Merge job allows the merge of multiple records in a single job—essentially incorporating the entire set of records to be merged as one batch. This batch job is initiated only by external applications that invoke the SIF MultiMergeRequest request. For more information, see Siperian Services Integration Framework Guide.

Promote Jobs
For state-enabled objects, the Promote job reads the PROMOTE_IND column from an XREF table and changes the system state to ACTIVE for all rows where the column’s value is 1. Siperian Hub resets PROMOTE_IND after the Promote job has run. Note: The PROMOTE_IND column on a record is not changed to 0 during the promote batch process if the record is not promoted. Here are the behavior details for the Promote batch job:
XREF State Base Object Before State Before Promote Promote PENDING ACTIVE Hub Hub Action Action on XREF on BO Promote Update Refresh Resulting BO BVT? State Yes ACTIVE

Operation Result Siperian Hub promotes the pending XREF and recalculates the BVT to include the promoted XREF. Siperian Hub promotes the pending XREF and base object. The BVT is then calculated based on the promoted XREF. Siperian Hub ignores DELETED records in Batch Promote. This scenario can only happen if a record that had been flagged for promotion is deleted prior to running the Promote batch process.

PENDING

PENDING

Promote

Promote Yes

ACTIVE

DELETED

This operation None behaves the same way regardless of the state of the base object record.

None

No

The state of the resulting base object record is unchanged by this operation.

Using Batch Jobs 741

Batch Jobs Reference

XREF State Base Object State Before Before Promote Promote ACTIVE

Hub Hub Action Action on XREF on BO None

Refresh Resulting BO BVT? State No The state of the resulting base object record is unchanged by this operation.

Operation Result Siperian Hub ignores ACTIVE records in Batch Promote. This scenario can only happen if a record that had been flagged for promotion is made ACTIVE prior to running the Promote batch process.

This operation None behaves the same way regardless of the state of the base object record.

You can run the Promote job using the following methods: • Using the Hub Console; for more information, see “Running Promote Jobs Using the Hub Console”. • • Using the CMXSM.AUTO_PROMOTE stored procedure; for more information, see “Promote Jobs” on page 790. Using the Services Integration Framework (SIF) API (and the associated SiperianClient Javadoc); for more information, see the Siperian Services Integration Framework Guide.

Running Promote Jobs Using the Hub Console
To run an Promote job: 1. In the Hub Console, start either of the following tools: • •
2. 3.

Batch Viewer according to the instructions in “Starting the Batch Viewer Tool” on page 674 Batch Group according to the instructions in “Starting the Batch Group Tool” on page 690

Select the Promote job for the desired base object. Execute the Promote job according to the instructions in “Running Batch Jobs Manually” on page 677 or “Executing Batch Groups Using the Batch Group Tool” on page 701. Display the results of the Promote job according to the instructions in “Viewing Job Execution Logs” on page 682.

4.

742 Siperian Hub Administrator Guide

Batch Jobs Reference

Siperian Hub displays the results of the Promote job:

Promote Job Metrics
After running a Promote job, the Batch Viewer displays the following metrics (if applicable) in the job execution log. Once the Promote job has run, you can view these statistics on the job summary page in the Batch Viewer.

Recalculate BO Jobs
There are two versions of Recalculate BO: • Using the ROWID_OBJECT_TABLE Parameter—Recalculates all base objects identified by ROWID_OBJECT column in the table/inline view (note that brackets are required around inline view). • Without the ROWID_OBJECT_TABLE Parameter—Recalculates all records in the base object, in batches of MATCH_BATCH_SIZE or 1/4 the number of the records in the table, whichever is less.

For more information, see “Recalculate BO Jobs” on page 791.

Using Batch Jobs 743

Batch Jobs Reference

Recalculate BVT Jobs
Recalculates the BVT for the specified ROWID_OBJECT. For more information, see “Recalculate BVT Jobs” on page 792.

Reset Match Table Jobs
The Reset Match Table job is created automatically after you run a match job and the following conditions exist: if records have been updated to consolidation_ind = 2, and if you then change your match rules, as described in “Configuring Match Column Rules for Match Rule Sets” on page 542. If you change your match rules after matching, you are prompted to reset your matches. When you reset matches, everything in the match table is deleted. In addition, the Reset Match Table job then resets the consolidation_ind=4 where it is =2. To learn more, see “About the Consolidate Process” on page 335. When you save changes to the schema match columns, the following message box is displayed.

Click Yes to reset the existing matches and create a Reset Match Table job in the Batch Viewer.

744 Siperian Hub Administrator Guide

Batch Jobs Reference

Note: If you do not reset the existing matches, your next Match job will take longer to execute because Siperian Hub will need to regenerate the match tokens before running the Match job. Note: This job cannot be run from the Batch Viewer.

Revalidate Jobs
Revalidate jobs execute the validation logic/rules for records that have been modified since the initial validation during the Load Process. You can run Revalidate if/when records change post the initial Load process’s validation step. If no records change, no records are updated. If some records have changed and get caught by the existing validation rules, the metrics will show the results. Note: Revalidate jobs can only be run if validation is enabled on a column after an initial load and prior to merge on base objects that have validate rules setup. Revalidate is executed manually using the batch viewer for base objects. For more information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674.

Stage Jobs
Stage jobs move data from a landing table to a staging table, performing any cleansing that has been configured in the Siperian Hub mapping between the tables (see “Mapping Columns Between Landing and Staging Tables” on page 380). Stage jobs have parallel cleanse jobs that you can run (see “About Data Cleansing in Siperian Hub” on page 406). The stage status indicates which Cleanse Match Server is hit during a stage. For more information about staging data, see “Configuration Tasks for the Stage Process” on page 364. For state-enabled base objects, records are rejected if the HUB_STATE_IND value is not valid. For more information regarding how to manage the state of base object or XREF records, refer to “About State Management in Siperian Hub” on page 206. Note: If the Stage job is grayed out, then the mapping has become invalid due to changes in the staging table, in a column mapping, or in a cleanse function. Open the

Using Batch Jobs 745

Batch Jobs Reference

specific mapping using the Mappings tool, verify it, and then save it. For more information, see “Mapping Columns Between Landing and Staging Tables” on page 380.

Stage Job Stored Procedure
When executing the Stage job stored procedure: • Run the Stage job only if the ETL process responsible for loading the landing table used by the Stage job completes successfully. • • Make sure that there are no dependencies between Stage jobs. You can run multiple Stage jobs simultaneously if there are multiple Cleanse Match Servers set up to run the jobs.

For more information, see “Stage Jobs” on page 795.

Stage Job Metrics
After running a Stage job, the Batch Viewer displays the following metrics in the job execution log:

The following table describes these metrics.
Metric Total records Inserted Rejected Description Number of records processed by the Stage job. Number of records inserted by the Stage job into the target object. Number of records rejected by the Stage job. For more information, see “Viewing Rejected Records” on page 685.

746 Siperian Hub Administrator Guide

Batch Jobs Reference

Synchronize Jobs
You must run the Synchronize job after any changes are made to the schema trust settings. The Synchronize job is created when any changes are made to the schema trust settings, as described in “Batch Jobs That Are Created When Changes Occur” on page 673. For more information, see “Configuring Trust for Source Systems” on page 455.

Reminder Prompt for Running Synchronize Jobs
When you save changes to schema column trust settings in the Systems and Trust tool, the following message box is displayed.

Clicking OK does not synchronize the column trust settings—this is just an information box that tells you to run the Synchronize job.

Running Synchronize Jobs
To run the Synchronize job, navigate to the Batch Viewer, find the correct Synchronize job for the base object, and run it. Siperian Hub updates the metadata for the base objects that have trust enabled after initial load has occurred.

Considerations for Running Synchronize Jobs
• • If you do not run the Synchronize job, you will not be able to run a Load job. This job can be run from the Batch Viewer only when a trust update is required for the base object. For more information, see “Running Synchronize Batch Jobs After Changes to Trust Settings” on page 467. A Synchronize job fails if a large number of trust-enabled columns are defined. The exact number of columns that cause the job to fail is variable and is based on

•

Using Batch Jobs 747

Batch Jobs Reference

the length of the column names and the number of trust-enabled columns. Long column names are at—or close to—the maximum allowable length of 26 characters. To avoid this problem, keep the number of trust-enabled columns below 48 and/or the length of the column names short. A workaround is to enable all trust/validation columns before saving the base object to avoid running the Synchronize job.

748 Siperian Hub Administrator Guide

18
Writing Custom Scripts to Execute Batch Jobs
This chapter explains how to create custom scripts to execute batch jobs and batch groups in a Siperian Hub implementation. The information in this chapter is intended for implementation teams and system administrators. For information how to configure and execute Siperian Hub batch jobs using the Batch Viewer and Batch Group tools in the Hub Console, see “About Siperian Hub Batch Jobs” on page 668. Important: You must have the application server running for the duration of a batch job.

About Executing Siperian Hub Batch Jobs
A Siperian Hub batch job is a program that, when executed, completes a discrete unit of work (a process). All public batch jobs in Siperian Hub can be executed as database stored procedures. For more information about batch jobs, see the “Using Batch Jobs” on page 667. In the Hub Console, the Siperian Hub Batch Viewer and Batch Group tools provide simple mechanisms for executing Siperian Hub batch jobs. However, they do not provide a means for executing and managing jobs on a scheduled basis. To execute and manage jobs according to a schedule, you need to execute stored procedures that do the work of batch jobs or batch groups. Most organizations have job management tools that are used to control IT processes. Any such tool capable of executing Oracle PL*SQL or DB2 SQL commands can be used to schedule and manage Siperian Hub batch jobs.

Setting Up Job Execution Scripts
This section describes how to set up job execution scripts for running Siperian Hub stored procedures.

About Job Execution Scripts
Execution scripts enable you to run stored procedures on a scheduled basis to execute and manage jobs. Use job execution scripts to perform the following tasks: • determine whether stored procedures can be run using job scheduling tools; for more information see “Determining Available Execution Scripts” on page 754 • retrieve identifiers for scripts that execute stored procedures; for more information, see “Retrieving Values from C_REPOS_TABLE_OBJECT_V at Execution Time” on page 755 determine which batch jobs are available to be executed using stored procedures; for more information, see “Determining Available Execution Scripts” on page 754.

•

750 Siperian Hub Administrator Guide

Setting Up Job Execution Scripts

•

schedule stored procedures to run synchronously or asynchronously; for more information, see “Running Scripts Asynchronously” on page 755.

Siperian Hub provides information regarding stored procedures, such as whether a stored procedure can be run using job scheduling tools, or how to retrieve identifiers that execute stored procedures in the C_REPOS_TABLE_OBJECT_V view.

About the C_REPOS_TABLE_OBJECT_V View
The C_REPOS_TABLE_OBJECT_V view contains metadata and identifiers for the Siperian Hub stored procedures.

Metadata in the C_REPOS_TABLE_OBJECT_V View
Siperian Hub populates the C_REPOS_TABLE_OBJECT_V view with metadata about its stored procedures. You use this metadata to: • determine whether a stored procedure can be run using job scheduling tools, as described in “Determining Available Execution Scripts” on page 754 • retrieve identifiers in the job execution scripts that execute Siperian Hub stored procedures, as described in “Retrieving Values from C_REPOS_TABLE_ OBJECT_V at Execution Time” on page 755

C_REPOS_TABLE_OBJECT_V has the following columns:
C_REPOS_TABLE_OBJECT_V Columns

Column Name ROWID_TABLE_OBJECT ROWID_TABLE

Description Uniquely identifies a batch job. Depending on the type of batch job, this is the table identifier for either the table affected by the job (target table) or the table providing the data for the job (source table). • • • For Stage jobs, ROWID_TABLE refers to the target table (staging table). For Load jobs, ROWID_TABLE refers to the source table (staging table). For Match, Match Analyze, Autolink, Automerge, Auto Match and Merge, External Match, Generate Match Tokens, and Key Match jobs, ROWID_ TABLE refers to the base object table, which is both source and target for the jobs.

Description of the batch job, including the type of batch job as well as the object affected by the batch job. Examples include: • • • Stage for C_STG_CUSTOMER_CREDIT Load from C_STG_CUSTOMER_CREDIT Match and Merge for C_CUSTOMER

OBJECT_TYPE_CODE

Together with OBJECT_FUNCTION_TYPE_CODE, this is a foreign key to C_REPOS_OBJ_FUNCTION_TYPE. An OBJECT_TYPE_CODE of “P” indicates a procedure that can potentially be executed by a scheduling tool.

OBJECT_FUNCTION_TYPE_ Indicates the actual procedure type (stage, load, match, and so on). CODE PUBLIC_IND PARAMETER Indicates whether the procedure is a procedure that can be displayed in the Batch Viewer. Describes the parameter list for the procedure. Where specific ROWID_ TABLE values are required for the procedure, these are shown in the parameter list. Otherwise, the name of the parameter is simply displayed in the parameter list. An exception to this is the parameter list for Stage jobs (where OBJECT_ NAME = CMX_CLEANSE.EXE). In this case, the full parameter list is not shown. For a list of parameters, see “Stage Jobs” on page 795. VALID_IND If VALID_IND is not equal to 1, do not execute the procedure. It means that some repository settings have changed that affect the procedure. This usually applies to changes that affect the Stage jobs if the mappings have not been checked and saved again. For more information, see “Determining Available Execution Scripts” on page 754.

OBJECT_DESC Change the status of records that have undergone the match process but had no matching data. Link data in BaseObjectName Generate BVT snapshot for BaseObjectName External Match for BaseObjectName Generate Match Tokens for BaseObjectName Load from Link BaseObjectName Process records that have been queued by a Match job for manual merge. Match Analyze for BaseObjectName Match for BaseObjectName Match and Merge for BaseObjectName

CMXSM.AUTO_PROMOTE Reads the PROMOTE_IND column from an XREF table and for all rows where the column’s value is 1, changes the ACTIVE state to on. CMXMM.MUNLINK CMXMA.RESET_LINKS CMXMA.RESET_MATCH CMXUT.REVALIDATE_BO CMXCL.START_CLEANSE
CMXUT.SYNC

Synchronize after changes are P made to the schema trust settings. Unmerge for BaseObjectName P

CMXMM.UNMERGE

X

Manual unmerge

Determining Available Execution Scripts
To determine which batch jobs are available to be executed using stored procedures, run a query using the standard Siperian Hub view called C_REPOS_TABLE_OBJECT_V, as shown in the following example:
SELECT * FROM C_REPOS_TABLE_OBJECT_V WHERE PUBLIC_IND = 1 :

754 Siperian Hub Administrator Guide

Monitoring Job Results and Statistics

Retrieving Values from C_REPOS_TABLE_OBJECT_V at Execution Time
Use SQL statements to retrieve values from C_REPOS_TABLE_OBJECT_V when executing scripts at run time. The following example code retrieves the STG_ROWID_TABLE and ROWID_TABLE_OBJECT for cleanse jobs.
SELECT A.ROWID_TABLE, A.ROWID_TABLE_OBJECT INTO IN_STG_ROWID_TABLE, IN_ROWID_TABLE_OBJECT FROM C_REPOS_TABLE_OBJECT_V A, C_REPOSE_TABLE B WHERE A.OBJECT_NAME = 'CMX_CLEANSE.EXE' AND B.ROWID_TABLE = A.ROWID_TABLE AND B.TABLE_NAME = 'C_HMO_ADDRESS' AND A.VALID_IND = 1;

Running Scripts Asynchronously
By default, the execution scripts run synchronously (IN_RUN_SYNCH = ‘TRUE’ or IN_RUN_SYNCH = NULL). To run the execution scripts asynchronously, specify IN_RUN_SYNCH = ‘FALSE’. Note that these Boolean values are case-sensitive and must be specified in upper-case characters.

Monitoring Job Results and Statistics
This section describes how to monitor the results and view the associated statistics of batch jobs run in job execution scripts.

Job Execution Status
Siperian Hub stored procedures log their job execution status and statistics in the Siperian Hub repository. The following figure illustrates the repository tables that can be used for monitoring job results and statistics:

756 Siperian Hub Administrator Guide

Monitoring Job Results and Statistics

The following table describes the various repository tables.
Repository Tables Used for Monitoring Job Results and Statistics

Table Name C_REPOS_JOB_CONTROL

Description As soon as a job starts to run, it registers itself in C_REPOS_JOB_ CONTROL with a RUN_STATUS of 2 (Running/Processing). Once the job completes, its status is updated to one of the following values: • • • • 0 (Completed Successfully)—Completed without any errors or warnings. 1 (Completed with Errors)—Completed, but with some warnings or data rejections. See the RETURN_CODE for any error code and the STATUS_MESSAGE for a description of the error/warning. 2 (Running / Processing) 3 (Failed—Job did not complete). Corrective action must be taken and the job must be run again. See the RETURN_CODE for any error code and the STATUS_MESSAGE for the reason for failure. 4 (Incomplete)—The job failed before updating its job status and has been manually marked as incomplete. Corrective action must be taken and the job must be run again. RETURN_CODE and STATUS_MESSAGE will not provide any useful information. Marked as incomplete by clicking the Set Status to Incomplete button in the Batch Viewer.

•

C_REPOS_JOB_METRIC

When a batch job has completed, it registers its statistics in C_REPOS_JOB_METRIC. There can be multiple statistics for each job. Join to C_REPOS_JOB_METRIC_TYPE to get a description for each statistic. Stores the descriptions of the types of metrics that can be registered in C_REPOS_JOB_METRIC. Stores the descriptions of the RUN_STATUS values that can be registered in C_REPOS_JOB_CONTROL.

C_REPOS_JOB_METRIC_TYPE C_REPOS_JOB_STATUS_TYPE

Writing Custom Scripts to Execute Batch Jobs 757

Stored Procedure Reference

Stored Procedure Reference
This section provides a reference for the stored procedures that represent Siperian Hub batch jobs. Siperian Hub provides these stored procedures, in compiled form, for each Operational Record Store (ORS), for Oracle databases. You can use any job scheduling software (such as Tivoli, CA Unicenter, and so on) to execute these stored procedures. Note: All the input parameters that need a delimited list require a trailing “~” character.

Alphabetical List of Batch Jobs
Batch Job Accept Non-matched Records As Unique Autolink Jobs Auto Match and Merge Jobs Automerge Jobs Description For records that have undergone the match process but had no matching data, sets the consolidation indicator to 1 (consolidated), meaning that the record was unique and did not require consolidation. Automatically links records that have qualified for autolinking during the match process and are flagged for autolinking (Autolink_ind=1). Used with link-style base objects only. Executes a continual cycle of a Match job, followed by an Automerge job, until there are no more records to match, or until the size of the manual merge queue exceeds the configured threshold. Used with merge-style base objects only. Automatically merges records that have qualified for automerging during the match process and are flagged for automerging (Automerge_ind=1). Used with merge-style base objects only. Generates a snapshot of the best version of the truth (BVT) for a base object. Used with link-style base objects only. Constructs an XML message and sends it to the MRM Server SIF API (ExecuteBatchGroupRequest), which performs the operation. For more information, see “Stored Procedures for Batch Groups” on page 799. Matches “externally managed/prepared” records with an existing base object, yielding the results based on the current match settings—all without actually modifying the data in the base object.

BVT Snapshot Jobs Execute Batch Group Jobs External Match Jobs

Generate Match Token Prepares data for matching by generating match tokens according to the current match Jobs settings. Match tokens are strings that encode the columns used to identify candidates for matching. Get Batch Group Status Jobs Returns the status of a batch group. For more information, see “Stored Procedures for Batch Groups” on page 799.

758 Siperian Hub Administrator Guide

Stored Procedure Reference

Batch Job Hub Delete Jobs Key Match Jobs

Description Deletes data from the Hub based on base object / XREF level input. Matches records from two or more sources when these sources use the same primary key. Compares new records to each other and to existing records,