Oracle Blog

Getting the most out of OWB and the Oracle Database

Tuesday Jun 25, 2013

Exciting day today as Oracle Database 12c is released. You can find lots of information on the release on OTN here. With this release comes another milestone on Oracle's Data Integration roadmap - OWB is no longer shipped with the database. You will notice that the OWB documentation is no longer included with the Oracle Database documentation, you can compare and contrast the 11.2 and 12.1 documentation below.

OWB 11gR2 is still supported with Oracle Database 12c, you will need 11.2.0.3 plus at least CP2 which has been certified with Oracle Database 12c. The 11.2.0.4 release will wrapper this into one install.

The JAR used in the example below can be downloaded here, from the process flow OWB invokes the main method within the ZipFile class for example - passing the parameters to the function for the input, output directories and also the number of threads. The parameters are passed as a string in OWB, each parameter is wrapped in ?, so we have a string like ?param1?param2?param3? and so on. In the example I pass the input directory d:\inputlogs as the first parameter and d:\outputzips as the second, the number of processes used is 4 - I have escaped my backslash in order to get this to work on Windows.

The classpath has the JAR file with the class compiled in it and the classpath value can be specified specified on the activity, carefully escaping the path if on windows.

Then you can define the actual class to use;

That's it, pretty easy. The return value from the method will use the exit code from your java method - normally 0 is failure and other values are error (so if you exit the java using a specific error code value you can return this code into a variable in OWB or perform a complex transition condition). Any standard output/error is also capture from within the OWB activity log in the UI, for example below you can see an exception that was thrown and also messages output to the standard output/error;

Thursday Mar 07, 2013

There are few documents on the Oracle Support site http://support.oracle.com to check if OWB is installed correctly on RAC and Exadata (Doc
ID 455999.1) and How to Install a Warehouse Builder Repository on a RAC (Doc ID
459961.1).

This blog will just show you how to install the OWB repository on RAC using OMBPlus.

The steps are:

Database preparation

Repository installation on the first Node

Copy of the rtrepos.properties File on all Nodes

Registration of all the other Nodes

Check the installation

Step 1: Database preparation

UNIQUE Service NameMake sure that
EACH Node in the RAC has a UNIQUE service_name. If this is not the
case, then add a unique service_name with the following command:

srvctl add service -d dbname -s instn -r instn

The
resulting service name is instn.clusterdomainname. For example, if the
instance name is racsrvc1,then the service name could be
racsrvc1.us.oracle.com.

"srvctl" can be used to manage the RAC services:srvctl [status|stop|start] service -d <db> -s <service-name>

Details are described in the OWB Installation Guide:Paragraph "Ensuring the Availability of Service Names for Oracle RAC Nodes"

LISTENER ConfigurationMake
sure that EACH Node has the LISTENER configured correctly. The listener
on each Node should be able to manage connections to the unique
database service of each Node.

Step 2: Repository installation on the first Node

We assume that RAC has 2 Nodes: NODE1 and NODE2, the database instance is setup and the OWB software has been installed on all Nodes of the RAC.

Start the OMBPlus shell on the primary node say Node 1 from <OWB_HOME>/owb/bin/unix/OMBPlus.sh

Step 3: Copy of the rtrepos.properties File on all Nodes

During the Repository seeding, a file rtrepos.properties is
created/updated on Node 1 at location
<OWB_HOME>\ owb\bin\admin directory. This file should be copied to
all RAC Nodes to the same location. In this case to Node 2 at <OWB_HOME>\ owb\bin\admin.

Step 4: Registration of all the other Nodes

After the Repository installation, all RAC Nodes should be registered.
This to enable the OWB Runtime Service to fail over to one of the other
Nodes when required (e.g. because of a node crash). This registration
process consists of an update in tables OWBRTPS and WB_RT_SERVICE_NODES.
These tables will be updated with Node specific details like the
Oracle_Home where the OWB software has been installed on the Node and
and host, port, service connection details for the instance running on
the Node.

Oracle provides a secure and convenient functionality for securing data in your datawarehouse, tables can be designed in OWB utilizing the Transparent Data Encryption capability. This is done by configuring specific columns in a table to use encryption.

When users insert data, the Oracle database transparently encrypts it and stores it in the column. Similarly, when users select the column, the database automatically decrypts it. Since all this is done transparently without any change the application code, the feature has an appropriate name: Transparent Data Encryption.

Encryption requires users to apply an encryption algorithm and an encryption key to the clear-text input data. And to successfully decrypt an encrypted value, users must know the value of the same algorithm and key. In Oracle database, users can specify an entire tablespace to be encrypted, or selected columns of a table. From OWB we support column encryption that can be applied to tables and external tables.

We secure the capture of the password for encryption in an OWB location, just like other credentials. This is then used later in the configuration of the table.

We can configure a table and for columns define any encryption, including the encryption algorithm, integrity algorithm and the password.

Then when the table is deployed from OWB, the TDE information is incorporated into the DDL for the table;

When data is written to this column it is encrypted on disk. Read more about this area in the Oracle Advanced Security white paper on Transparent Data Encryption Best Practices here.

Wednesday Dec 19, 2012

I was using the COLUMN TRANSFORMS syntax to load LOBs into Oracle using the Oracle external which is a handy way of doing several stuff - from loading LOBs from the filesystem to having constants as fields. In OWB you can use unbound external tables to define an external table using your own arbitrary access parameters - I blogged a while back on this for doing preprocessing before it was added into OWB 11gR2.

For loading LOBs using the COLUMN TRANSFORMS syntax have a read through this post on loading CLOB, BLOB or any LOB, the files to load can be specified as a field that is a filename field, the content of this file will be the LOB data.

So using the example from the linked post, you can define the columns;

Then define the access parameters - if you go the unbound external table route you can can put whatever you want in here (your external table get out of jail free card);

This will let you read the LOB files fromn the filesystem and use the external table in a mapping.

Pushing the envelope a little further I then thought about marrying together the preprocessor with the COLUMN TRANSFORMS, this would have let me have a shell script for example as the preprocessor which listed the contents of a directory and let me read the files as LOBs via an external table. Unfortunately that doesn't quote work - there is now a bug/enhancement logged, so one day maybe. So I'm afraid my blog title was a little bit of a teaser....

Monday Aug 13, 2012

The 11gR2 release of OWB introduced ways of overriding the default values for properties – users may wish to change the seeded default values for properties (for all objects of a type. You can do this using the enterprise feature supplied in Configuration Templates.

These configuration templates are defined on the global tree, once you create a configuration template it is used in a configuration – then any objects created will inherit these default values.

You can create a new template and provide a name and description;

This then brings up the editor for the configuration template, the properties are in the tree, and the columns such as PRODUCTION_DEFAULTS is where you can change the property value.

So for example if you wanted to change the property value for Generation Mode – so rather than generation All Operating Modes which is the default, you can just generate Set Based, you would find this property;

Then change the value to Set Based for your configuration template;

Lots of property defaults are here, see there is also one for Default Operating Mode, if you were to change the default code gen to just be Set Based, it makes sense to also change the default operating mode to Set Based.

Remember these are defaults so you are not setting specific values on an object – these are the defaults of o overriden value is specified. There are many other interesting properties from tablespace info for tables to all sorts of properties for mappings.

The final piece of the jigsaw is to use this configuration template in a configuration – otherwise it will never be used.

Tuesday Aug 07, 2012

There is a configuration property in OWB for switching between ANSI SQL code generation and Oracle SQL. It is under the ‘Code generation options’ in the mapping configuration. The join condition is expressed in Oracle SQL join syntax and OWB will reinterpret if generating ANSI SQL.

You can change the value to false, generate the code and inspect it inline within the mapping editor;

The 11gR2 release of OWB has changes in the join component to allow you to express the join type in a logical manner, so you can indicate outer join on a group for example.

Wednesday Jul 18, 2012

As well as all of the parallel query and DML capabilities exposed in OWB, the 11gR2 release of OWB includes out of the box support for the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle database, there are various articles on how to do the grunt work manually coding it on the web (see here for example). The use cases for this range from Oracle techie stuff like managing roll back segments for large table updates to parallelizing your PLSQL mappings…whatever they may be. Here we will see how with a couple of configuration settings in OWB you can self parallelize mappings – divide and conquer the olde fashioned way.

The use case that is generally mentioned for using this is for doing large table updates – basically its dividing a BIG problem into smaller pieces, the package also exposes a resume capability for reprocessing failed chunks which is also exposed from OWB (when you execute a chunked mapping you can resume or start a mapping). The approach can also be utilized for parallelizing arbitrary PLSQL (support for set based, row based also the table being chunked can be the source or the target or an arbitrary SQL statement). The 11.2.0.3 patch is best for this.

Chunk by SQL Statement

The example below updates employees salaries, driving the parallel worker threads by the departments table. As you see below we the start and end department id are the same, we are doing a distinct, so all employees in a department will be updated in each chunk, so we can increase the salary within the EMPLOYEES table for each department in a chunk. We are using the department id to drive the chunking – each department if processed in its own thread. We will see how we can control the thread pool of workers.

Note above there are other options such as chunk by ROWID and an arbitrary SQL statement).

The mapping now using olde fashioned divide and conquer has new runtime properties which relate back to the DBMS_PARALLEL_EXECUTE package and include the resume functionality for processing failed chunks and the parallel level, so you can change at runtime whether 1,2 or n threads are used to process the chunks.

We are saying that we want 2 child processes processing the chunks, this is a runtime parameter. Its wise to think about the number of chunks and the number of child processes to ensure optimal execution plan. The image below depicts the code that gets generated with chunk parallel level 2, and parallel level 4 – essentially the package ‘main’ procedure uses DBMS_PARALLEL_EXECUTE to process the heart of the mapping in the child processes.

There is much more to the package than meets the eye the resume capability for example provides a way of reprocessing failed chunks, rather than reprocessing everything again. This is also exposed as a runtime parameter for the chunked OWB mapping so you can resume and reprocess only the failed chunks.

Chunk by ROWID

This is the classic Large Table Update example that typical divide and conquer is used for. This example updates employees salaries, driving the parallel worker threads by rows in the target employees table itself. With this example when we configure the mapping we pick chunk method as ROWID, the chunk table EMPLOYEES, chunk type BY_ROWS and size is 1000 for example. The EMPLOYEES table is set to perform and update, I define the target filter for update for the ON clause in the merge/update – so its the ‘how do I identify the rows to update within this chunk’ clause. The other thing I had to do was define a chunking expression – now in this case its kind of redundant since the chunk is being done in the ON clause of the update – so we can trick OWB by just saying ‘start=start and end=end’ using the variables. If you don’t specify this, OWB will complain that the chunking expression is invalid.

So the MERGE statement within the chunking code will increase the SALARY for the chunk, you will see the expression to increase salary by, the dummy chunking expression used in selecting from the source (DUAL) and which rows to match – the rows in this chunk.

This let’s us perform large table updates in chunks and drive the parallelized mapping using mapping input parameters.

The parallelization for PLSQL (or row based mappings) is an interesting case - for example for the likes of match merge which has an inherent divide and conquer strategy (in binning), with match merge out of the box the processing of the bins is serialized by default. Combining the chunk parallelization with the match merge binning lets you boost the performance of such mappings. So if you pair the chunking criteria with the binning you can substantially increase the performance of such mappings.

Wednesday Jun 20, 2012

Here’s a quick post on how to create both master and work repositories in one simple dialog, its using the groovy capabilities in ODI 11g and the groovy swing builder components. So if you want more/less take the groovy script and change, its easy stuff. The groovy script odi_create_repos.groovy is here, just open it in ODI before connecting and you will be able to create both master and work repositories with ease – or check the groovy out and script your own automation – you can construct the master, work and runtime repositories, so if you are embedding ODI as your DI engine this may be very useful.

When you click ‘Create Repository’ you will see the following in the log as the master repository starts to be created;

This is another example of using the ODI 11g SDK showing how to automate the construction of your data integration environment. The main interfaces and classes used here are IMasterRepositorySetup / MasterRepositorySetupImpl and IWorkRepositorySetup / WorkRepositorySetupImpl.

Thursday Jun 07, 2012

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

improves the out of the box experience – just build the mapping and the appropriate KM is used

improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

Friday Apr 06, 2012

Following on from the posting some time back entitled ‘ODI 11g – Simple, Powerful, Flexible’ here we push the envelope even further. Rather than just having the SQL we override defined statically in the interface design we will have it configurable via a variable….at runtime.

Imagine you have a well defined interface shape that you want to be fulfilled and that shape can be satisfied from a number of different sources that is what this allows - or the ability for one interface to consume data from many different places using variables. The cool thing about ODI’s reference API and this is that it can be fantastically flexible and useful.

When I use the variable as the option value, and I execute the top level scenario that uses this temporary interface I get prompted (or can get prompted to be correct) for the value of the variable.

Note I am using the <@=odiRef.getObjectName("L","EMP", "SCOTT","D")@> notation for the table reference, since this is done at runtime, then the context will resolve to the correct table name etc.

Each time I execute, I could use a different source provider (obviously some dependencies on KMs/technologies here). For example, the following groovy snippet first executes and the query uses SCOTT model with EMP, the next time it is from BOB model and the datastore OTHERS.

Wednesday Apr 04, 2012

One of the most commonly asked questions led to one of the most commonly accessed support documents (strange that) for OWB is the document describing how to update the OWB repository details after cloning the Oracle database. The document in the Oracle support site has id 434272.1, and is titled 'How To Update Warehouse Builder After A Database Cloning (Doc ID 434272.1)'. This post is really for me to remember the document id;-)

Tuesday Mar 27, 2012

With the IKM Oracle Multi Table Insert you can generate Oracle specific DML for inserting into multiple target tables from a single query result – without reprocessing the query or staging its result.

When designing this to exploit the IKM you must split the problem into the reusable parts – the select part goes in one interface (I named SELECT_PART), then each target goes in a separate interface (INSERT_SPECIAL and INSERT_REGULAR).

Firstly I create a SELECT_PART temporary interface for the query to be reused and in the IKM assignment I state that it is defining the query, it is not a target and it should not be executed.

Then in my INSERT_SPECIAL interface loading a target with a filter, I set define query to false, then set true for the target table and execute to false. This interface uses the SELECT_PART query definition interface as a source.

Finally in my final interface loading another target I set define query to false again, set target table to true and execute to true – this is the go run it indicator!

To coordinate the statement construction you will need to create a package with the select and insert statements. With 11g you can now execute the package in simulation mode and preview the generated code including the SQL statements.

Hopefully this helps shed some light on how you can leverage the Oracle MTI statement. A similar IKM exists for Teradata. The ODI IKM Teradata Multi Statement supports this multi statement request in 11g, here is an extract from the paper at www.teradata.com/white-papers/born-to-be-parallel-eb3053/

Teradata Database offers an SQL extension called a Multi-Statement Request that allows several distinct SQL statements to be bundled together and sent to the optimizer as if they were one. Teradata Database will attempt to execute these SQL statements in parallel. When this feature is used, any sub-expressions that the different SQL statements have in common will be executed once, and the results shared among them.

It works in the same way as the ODI MTI IKM, multiple interfaces orchestrated in a package, each interface contributes some SQL, the last interface in the chain executes the multi statement.

Wednesday Mar 21, 2012

Here we will look at how to load large volumes of data efficiently into the Oracle database using a mixture of CTAS and partition exchange loading. The example we will leverage was posted by Mark Rittman a couple of years back on Interval Partitioning, you can find that posting here. The best thing about ODI is that you can encapsulate all those ‘how to’ blog posts and scripts into templates that can be reused – the templates are of course Knowledge Modules.

The interface design to mimic Mark's posting is shown below;

The IKM I have constructed performs a simple series of steps to perform a CTAS to create the stage table to use in the exchange, then lock the partition (to ensure it exists, it will be created if it doesn’t) then exchange the partition in the target table. You can find the IKM Oracle PEL.xml file here.

The IKM performs the follows steps and is meant to illustrate what can be done;

So when you use the IKM in an interface you configure the options for hints (for parallelism levels etc), initial extent size, next extent size and the partition variable;

The KM has an option where the name of the partition can be passed in, so if you know the name of the partition then set the variable to the name, if you have interval partitioning you probably don’t know the name, so you can use the FOR clause. In my example I set the variable to use the date value of the source data

FOR (TO_DATE(''01-FEB-2010'',''dd-MON-yyyy''))

Using a variable lets me invoke the scenario many times loading different partitions of the same target table.

Below you can see where this is defined within ODI, I had to double single-quote the strings since this is placed inside the execute immediate tasks in the KM;

Note also this example interface uses the LKM Oracle to Oracle (datapump), so this illustration uses a lot of the high performing Oracle database capabilities – it uses Data Pump to unload, then a CreateTableAsSelect (CTAS) is executed on the external table based on top of the Data Pump export. This table is then exchanged in the target. The IKM and illustrations above are using ODI 11.1.1.6 which was needed to get around some bugs in earlier releases with how the variable is handled...as far as I remember.

Friday Mar 02, 2012

This post is a useful index into the ODI SDK that cross references the type names from the user interface with the SDK class and also the finder for how to get a handle on the object or objects. The volume of content in the SDK might seem a little ominous, there is a lot there, but there is a general pattern to the SDK that I will describe here.

Also I will illustrate some basic CRUD operations so you can see how the SDK usage pattern works. The examples are written in groovy, you can simply run from the groovy console in ODI 11.1.1.6.

This isn't an all encompassing summary of the SDK, but covers a lot of the content to give you a good handle on the objects and how they work. For details of how specific complex objects are created via the SDK, its best to look at postings such as the interface builder posting here. Have fun, happy coding!