IBM InfoSphere Optim Data Growth: Setting up your first Archive

This article introduces IBM® InfoSphere®
Optim™ Solutions, which help you manage enterprise data in every phase
of the information lifecycle. Learn the fundamentals of InfoSphere Optim Data
Growth Solution. A step-by-step example shows how to set up your first Archive
request and quickly get up to speed. The essentials of an effective archiving
strategy are also covered.

Satvinder has close to 14 years of experience in database administration, consulting, and DB2 replication and implementation. Currently, she is an InfoSphere Optim architect (data growth/data masking and TDM solutions) for a major healthcare client. Satvinder is an InfoSphere Optim Specialist, Database Consultant (DB2), Replication Architect, and IBM Certified IT Specialist.

Introduction to
InfoSphere Optim Solutions

Organizations are increasingly challenged with successfully managing data
growth. They have large volumes of data, which is stored in various data
repositories, which are likely to grow exponentially in coming years. This
growth is leading to operational and storage issues that create havoc with
performance and maintainability of applications.

Optim Data Growth, a market leader in archiving space, provides excellent
enterprise data management solutions that help solve various business
problems that are related to data growth, test data management, and data
masking requirements. Optim is a single, scalable solution that supports
almost all applications, databases, and environments in any
enterprise.

With data that pours in from diverse sources, application-independent
access methods are critical in fast-changing scenarios. With Optim, users
can:

Optim Data Growth Solution

Optim Data Growth Solution resolves the adverse impact of rapid data growth
by archiving historical data safely to a secure archive. It enables
enterprises to achieve universal access to archives for retention
compliance, legal requirements, and long-term usage.
Archiving historical data out of production lightens the database volume,
streamlines processing workloads, improves application performance,
and lowers overall storage costs.
Archiving also enables faster application upgrades as it
reduces the amount of data to be migrated, which reduces downtime and
helps achieve timely project completion. To control the proliferation of
applications and their maintenance across an organization, Optim allows
companies to safely retire legacy or unsupported applications
(decommissioning) while still allowing access to underlying data
records.

Optim Test DM Solution

It's common practice to clone an entire production database for use in
application development and testing. When data is duplicated, storage and
maintenance costs increase proportionally and affect the company’s total
data burden. Optim Test DM helps by extracting a precise subset of
application data records, which are complete in business context. It
creates targeted right-sized test databases that are faster and can be
repeated more easily than cloning entire production copies. Companies can
speed up the deployment of new or upgraded applications with Optim Test
DM. With less data to process, iterative testing cycles are completed
faster. This data subset also facilitates identifying and fixing bugs in
the earliest stages of development.

Data masking

Optim data masking is a solution to protect confidential data by
de-identifying (masking) sensitive or personal identifiable data. To mask
the data, the solution replaces it with valid but fictitious data, which
renders the data unusable and without value even if it’s stolen. This
masking protects the client's business from financial loss of information,
and provides a simple-to-use solution to its IT department. Optim helps to
create a protected test database so IT staff generate accurate test
results while still adhering to privacy policies applicable to HIPAA, PHI,
and other forms of confidential data in an organization.

Optim Archive project
phases

By implementing good archiving strategy, organizations can effectively
control growth of their enterprise data. Standard phases include:

Planning

Investigate the project goals. What is the driving force, or expected
benefit, of the archiving project? Requirements are largely driven by
one or many factors among data growth management, compliance, cost
reduction, improving performance. Choose an Optim project team with
the appropriate skills based on the drivers, complexity, and size of
the project.

Analysis

Review what data needs to be archived, and the archival and retention
rules that apply to the data. This information is provided to the
Optim team in a "Technical Requirements" document from the
client.

Design

Design of the solution differs depending on different requirements or
goals. For example, a decommissioning archiving solution design that
runs one time differs from a design that requires continuous
archiving in the production system to regularly move older data out of
production database. (This article explores a design for a general
archive to illustrate how the archive process is designed with Optim
tools.)

Testing

After the archive design is built in a development environment, test
it in different scenarios to ensure that the desired outcome is
achieved.

Deployment

During deployment, Optim helps to promote archive design scripts from
development to the production environment. You can use different
utilities, such as export/import, based on the deployment models with
which the Optim project is designed.

Working with InfoSphere
Optim Data Growth

Instructions for installing InfoSphere Optim Data Growth and configuring
your first workstation are outside the scope of this article. See the
Installation chapter of InfoSphere Optim Data Growth Solution for
information.

You can design various types of archive architectures that depend on the
performance and scalability requirements. For this article, you use a
simple configuration where Optim operates on a single workstation and all
processing takes place on the same machine.

It is assumed that the following steps are completed:

At least one database is installed on a local Windows® machine.
For this example, the workstation was installed with the SAMPLE DB2
database where Optim directory tables and test tables that participate
in the Archive example are located.

Install Optim on the same machine (first workstation).

Configure the first workstation to create an Optim directory, which
is stored in the SAMPLE database, and any DB Alias to access your
databases (SAMPLE). Enable the Server feature if the first workstation
is to be used as an Optim Server (which it is in this case). Also,
review product configuration and personal product options; most of
them are set as the default.

Common
elements

Optim processes and tracks progress of its utilities and functions by
storing various kinds of objects in an Optim directory. For
example, a directory can store the DB alias, access definitions, table
maps, column maps, action requests, and more. New and modified primary
keys and relationships can also be created in Optim, in addition to what
is defined in the source database depending on the project archive
requirements.

Figure 1 shows an overview of the Archive process. The rest of this section
discusses a few elements that are used in building your first Archive.

Figure 1. Archive process overview

Optim directory

The directory can be considered the "brain" behind how Optim performs and
does data processing. Optim creates a set of tables in which it stores
objects that are needed for processing enterprise data management tasks
and tracking processing status. These tables get created while you
configure the first workstation before you use Optim.

Best practice

Create as many Optim directories as needed to satisfy site requirements
(for example, one for each kind of environment: Development, Test, and
Production).

Generally, in a small architecture setup, one site typically shares one
Optim directory regardless of the number of database instances to be
accessed or the number of workstations that use Optim. In the example, as
shown in Figure 2, I created the directory with the default name OPTIMDIR
when I configured the first workstation. Some important objects that are
stored in an Optim directory include DB Alias, access definitions, table
maps, column maps, primary keys, relationships, and action requests.

Figure 2. Optim Directory that is configured in
Optim

DB Alias

DB Alias, as shown in Figure 3, is one of the mandatory Optim directory
objects. It provides parameters that are needed to access specific target
or source databases. Each DB Alias name must be unique, and each database
can have only one DB Alias that is defined in Optim. DB Alias provides a
naming convention that is used to specify high-level qualifiers for object
and table names to access specific databases and to perform requested
functions. To denote this convention, it serves as a prefix in the fully
qualified names of primary keys, database tables, and relationships. The
format is: Dbalias.creatorid.tablename

Figure 3. DB Alias that is configured in
Optim

Action requests

For data growth management, Optim has four significant processes:

Archive

Processes user-defined data from databases according to data
management policies set in the organization, yet retains access to the
archived data for future access, compliance, or restoration purposes.
Figure 4 shows an example.

Delete

Deletes archived data from selected or all tables. The Delete process
can be defined as part of the Archive process or in a separate step
after it archives data. The latter option is used when you want to
validate the archive data or have a data retention policy that is
defined for data deletion in an enterprise.

Restore

Selectively restores archive data back to a production database or
different database. This process brings back data from archive files
to a relational database and generally is used when an application
needs to access or modify the archived data. Moreover, methods
available in Optim can restore archived data into a database even if
the original data model changes.

Browse

Enables user access to view or browse archived data that is extracted
in archive/extract files without having to restore data.

Figure 4. Actions

Access Definition

With the Access Definition, one of the most important elements, Optim
designers can define objects, or a set of related data, for Optim to
archive or restore.
It identifies database tables that contain a desired
set of data to be archived and how relationships are to be traversed
between different pairs of tables involved.
Use the Access Definition to
define selection criteria to limit a subset of data to be accessed.
While you can create local access definition, it's also possible
in Optim to create a named Access Definition with the intent
of reusing it by a different user or an Optim workstation.

Archive File

Optim lets you store your data in either extract or Archive Files. An
Archive File contains data, object definitions, and many other database
components that together can re-create a database if required. You can
store an Archive File on a local, remote, or shared file server. View the
contents of an Archive File by using the Browse utility that is provided
by Optim. Similar to how indexes help with search and retrieval of data
from database tables, you can create an Archive index so you can quickly
Browse or Restore data.

Designing your
first Archive

In the following example scenario, the IT department of company X received
a directive to archive all records for a customer in a country where the
company stopped its sales over two years ago due to a legal issue. The
data needs to be archived because the company:

Wants to store the data for the future due to a compliance
requirement.

Wants to delete data from the production database to control the
current data growth.

This exercise will refer to sample tables created when you configure the
first workstation, which is stored in SAMPLE database.

The solution involves:

Identifying the data to archive.

Designing an Archive by using the defined selection criteria.

Creating the Archive with the Delete option.

Defining the data to delete.

Running the Archive.

Source information

For this scenario:

Optim Directory: OptimDir

DB alias: SAMPLE

Table Description and relationships, which are shown in Figure 5, are:

Figure 5. Table relationships

Designing the Archive

Use the following steps to design the example Archive.

Open the Optim tool.

Click File -> New ->
Archive, which opens the Archive Request
Editor.

Enter a description for your Archive Request that reflects the purpose
of this archive, as in Figure 6. Your Description is Archive
AU country data and ArchiveAU.AF is the
name of the output Archive File.

Tip

You can add Optim variables to dynamically generate a regular
archive file name from the same archive request. For example,
adding the suffix ArchiveAU<$YYYY — $MM — $DD
— $SEQ4> generates an archive file name with today's
date and sequence supplied: ArchiveAU2014-05-21-0001.AF.

Click or tab to the Archive Index File field. The Optim tool
automatically inserts the Archive Index File name as the Archive file
given earlier along with the .AFX extension.

Clear the check box for Defer Delete after Archive. Within this same
archive request, you will run a Delete request after you archive the
data.

(You can optionally add this archive to Group, which acts as
a tag so it can be associated with related Archive files for
future search criteria purposes. Also, experiment with various
other Optim options that you can specify in this dialog.)

Figure 6. Archive Request Editor

Click the Edit access definition icon on the toolbar,
as shown in Figure 7, to open the Access Definition Editor dialog and
define the data for archiving.

Enter a description for your Access Definition: AD for Archiving AU
country data. In the Default Qualifier field, click
Select, to the right of the Default Qualifier box
to open the Select Default Qualifier dialog.

In the Select Default
Qualifier dialog, double-click the DB Alias,
SAMPLE, and the Creator ID
SKAUR (the sample table was created with
SKAUR user) to select the default qualifier as
SAMPLE.SKAUR.

Data fetch begins from OPTIM_CUSTOMERS tables and then find related
records to archive. In the Start Table field, click
Select, on the right of the Start Table box, to
open the Select Access Definition Start Table.

In the Select Access
Definition Start Table dialog for the DB Alias, SAMPLE,
double-click OPTIM_CUSTOMERS to select it as the
start table. This pattern forms the table as three parts:
SAMPLE.SKAUR.OPTIM_CUSTOMERS. Once selected, the Start Table field
now contains this table name.

This action also populates the
table name as one of the first entries in the Table/View
column.

Right click OPTIM_CUSTOMERS and select Add
Tables to add the remaining related tables.

Another
window opens with the current DB Alias SAMPLE selected.

Select the box to Find Tables Related to Table, as in Figure 8.
Verify that you have Both (the default) selected to
find both parent and child tables that are related to the specified
table. Click Display to see what tables would be
selected. There are four tables (OPTIM_SALES, OPTIM_ORDERS,
OPTIM_DETAILS, and OPTIM_ITEMS) in the list.

Figure 8. Selecting related tables for
Archive

Tip

It's often a requirement to archive data based on data age
(data older than three years, for example). Optim supplies a
date function for selection criteria: BEFORE (nD, nW, nM, nY),
where D, W, M, Y indicate number of days, weeks, months, or
years before current day at run time. If you want to archive
data older than two years, use BEFORE (2Y).

Click Select All Matching, which returns to the
previous window (Access Definition Editor). Now you see all five
tables: OPTIM_CUSTOMERS, OPTIM_SALES, OPTIM_ORDERS, OPTIM_DETAILS, and
OPTIM_ITEMS.

You need to define the selection criteria, which is COUNTRY_CODE=’AU’
for your example. In the Access Definition Editor dialog, right click
OPTIM_CUSTOMERS and select Table
Specifications -> Selection Criteria.

In the Table Specifications, enter =‘AU’ in the
COUNTRY_CODE column, as in Figure 9.

Figure 9. Selection Criteria

To close the Table Specifications dialog, click the X
in the upper right corner.

The selection criteria icon is added in
the Table Specification column of the Access Definitions Editor,
as in Figure 10.

Figure 10. Access Definitions main screen
that shows selection icon

Click the Relationships tab to review the traversal
path of parent to child (default option 1) or child to parent (option
2). For this example, leave it option 1.

Note: Option (1)
determines whether Archive traverses a relationship from child to
parent and archives a parent row for each selected child row. By
default, Option (1) is selected. As a result, this setting
maintains the relational integrity of the data. Option (2)
determines whether more child rows are archived when a parent row
is archived because of Option (1). In other words, if Archive
traversed a relationship from child to parent and archived a
parent row, all child rows of that parent are archived when Option
(2) is selected. By default, Option (2) is cleared.

Click Show Steps from the toolbar to show the
traversal path that Optim uses internally for this particular Archive.
Listing 1 shows an example.

Listing 1. Show steps
screen

Step 1: Extract Rows from Start Table SAMPLE.SKAUR.OPTIM_CUSTOMERS.
Selection Criteria and/or Statistical Controls are used, these determine rows selected.
Step 2: Extract Rows from SAMPLE.SKAUR.OPTIM_SALES which are Parents of Rows
Previously Extracted from SAMPLE.SKAUR.OPTIM_CUSTOMERS in Step 1 to satisfy an RI rule
using Relationship RSC.
Step 3: Extract Rows from SAMPLE.SKAUR.OPTIM_ORDERS which are Children of Rows
Previously Extracted from SAMPLE.SKAUR.OPTIM_CUSTOMERS in Step 1 using Relationship RCO.
Step 4: Extract Rows from SAMPLE.SKAUR.OPTIM_DETAILS which are Children of Rows
Previously Extracted from SAMPLE.SKAUR.OPTIM_ORDERS in Step 3 using Relationship ROD.
Step 5: Extract Rows from SAMPLE.SKAUR.OPTIM_ITEMS which are Parents of Rows
Previously Extracted from SAMPLE.SKAUR.OPTIM_DETAILS in Step 4 to satisfy an RI rule
using Relationship RID.

You want to delete the transactional data present for selected country
customers in the OPTIM_ORDERS, OPTIM_DETAILS, and OPTIM_CUSTOMERS tables
from the database. But, you like to keep the master data OPTIM_SALES and
OPTIM_ITEMS tables in production because this data will be relevant to
future operations. Follow these steps.

In the Access Definition window, select the boxes in the
Delete Rows After Archive column for tables that
are selected for the delete operation, as in Figure 11. With Delete as
part of the same archive request, Optim will delete data from selected
tables after a successful archive.

Figure 11. Access Definitions, delete rows
from table option

Tip

You can DEFER the DELETE to a later date if you want to involve
the testing team to validate data that is archived.

Close the Access Definition Editor and select Yes
when prompted with saving Access Definition. The Archive Request
Editor window opens.

Close the Archive Request Editor window and save as a two-part name
when prompted (for example, ARCH.COUNTRYAU).

You're now ready to
run your Archive request.

You have two options for running the ARCHIVE request: from the GUI or
from the command line.

From the GUI, click File ->
Open, which opens Object Explorer. Expand
the archive tree and select the Archive request that you
created.

From the command line, the Optim command-line
processing executable is installed in the ${OPTIMHOME}\RT\BIN\
directory. The name of the executable is PR0CMND:

PR0CMND /R TYPE=type REQUEST=identifier.name

type can be any of the { ARCHIVE | DELETE |
REPORT | RESTORE } values.

identifier.name is the name of the
action request that is pre-created by using the GUI on the
Optim workstation and saved in the Optim Directory.

You can specify several other parameters to alter
or override the behavior of the stored Optim request. A
few frequently used parameters are: STOP, QUIET, OUTPUT,
MONITOR, SERVER, and PST. You can enter parameters with
the command on the command line or enter them into a text
file (called a parameter file).

Run an Archive
request from the command line (for the Archive that is
designed in this
article):

Tip

Optim has a facility to join tables in the Browse window that shows
related records at various levels with a simple click of the
Join button.

Now that you've finished this exercise, it would be beneficial to view
Archived data. Optim provides a feature for browsing data: Click the
InfoSphere Optim tool menu, select Utility ->
Browse, and then select the Archive
file. You will see a list of tables along with row count archived for each
table. You can double-click any table to view its contents and also join
related contents with the Join button.

Conclusion

Today, organizations face tremendous challenges as they deal with data
explosion and information governance for their enterprise data. You cannot
ignore the need for a strong solution that provides comprehensive
capabilities for managing enterprise application data across applications,
databases, operating systems, and hardware platforms. IBM InfoSphere Optim
Data Growth Solution is widely used and a market leader in this space.

This article provided an overview of InfoSphere Optim Data Growth for
systems development professionals, data architects, data administrators,
or anyone who needs to understand archiving of products. You learned the
fundamentals of the product, and common elements such as Optim directory,
DB alias, and action requests. This introductory article should help you
get started with your first Archive process.

The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.