Getting the most out of OWB and the Oracle Database

Parallel Direct Path Inserts into Partitioned Tables

Combining the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle database with direct path inserts into partitioned tables is a useful pairing. Another flexibility facet I used here is the use of interval partitioning - so with the pattern, not only is the ETL loading into the partitions in an efficient, flexible implementation but the database is managing the partitions in the table too. Here I'll show the raw first step of how it's done then go about generalizing it into the tools.

The figure below shows chunks of source data being processed via a pool of jobs writing into a partitioned target table in direct path. The parallel execute package has some tables which define the tasks and chunks to be processed by the parallel job pool, you can query these via the data dictionary views (*_PARALLEL_EXECUTE_TASKS and *_PARALLEL_EXECUTE_CHUNKS). The package also supports resuming a task to reprocess failed chunks which is useful. There is an introductory article on DBMS_PARALLEL_EXECUTE worth checking out in the May/June Oracle magazine from Steven Feuerstein.

In the Oracle SQL grammar the partition key value of the partition extension clause in the INSERT DML provides critical information that will enable us to make a pattern for providing parallel direct path loads into partitioned tables.

So if we make the chunking column from DBMS_PARALLEL_EXECUTE useful for identifying the partition key value above then we have a winner. The parallel execute chunking identifier is a numeric value - in the example below the SALES table is partitioned by month, so we can imagine the chunking identifier using YYYYMM (ie. 200812 for December 2008) to represent a month in numeric form and this being converted to a date for the partition key value in the INSERT SQL clause using something like TO_DATE(200812, 'YYYYMM').

The illustration here will load a partitioned SALES table that uses interval partitioning so we get a table that the database will manage the addition of partitions.

The source table I'm using mirrors the target table, without the partitions, also added some basic data here for a demo - each batch of rows I have added will be processed in a chunk (imagine it was a lot of data in that partition)

To create the task and chunks for the execution we can use the DBMS_PARALLEL_EXECUTE APIs, in the call below we define the task with a name and in this case a SQL statement to identify the chunks (demo example, should be careful on performance here, commonly an indexed numeric field is used);

Then we have to define the meat of the task and the number of jobs to process the chunks, note I am using dynamic SQL since the partition key value cannot be a bind variable - and its value will change in each child task that the parallel execute engine executes (it itself will pass start_id and end_id as bind variables to this block).

Whilst the above is running the parallel execution package spawns 2 jobs (since I indicated parallel level of 2). If I quickly look at the USER_PARALLEL_EXECUTE_CHUNKS view I see 6 chunks, since I had 6 distinct months of data. I can see below the first two chunks are in ASSIGNED status and are being processed.

Checking the view again I see 2 have are now in PROCESSED status, and 2 are ASSIGNED - note the start_id and end_id columns here, these are the bind variables passed to my PLSQL block in the RUN_TASK routine above, it is these I use for the PARTITION FOR key value.

Finally all chunks are processed and I have processed the data in parallel writing direct path into the partitioned target tables.

I mentioned earlier the resume capability, this is also very useful. There is another state PROCESSED_WITH_ERROR that will be flagged when the chunk being processed has failed for whatever reason. The RESUME_TASK procedures allow you to retry the task and reprocess just those chunks that failed,

Interesting stuff, combining some capabilities in 11gR2 of the database to boost processing and leverage those CPUs! Hopefully this sparks some other ideas out there. Next up I'll take this into the data integration tools and illustrate how it can be commoditized and avoid the programming.

Hi Stewart
Yep of course.
The DBMS_PARALLEL_EXECUTE package documentation and posts are primarily slated for large table updates, I'm sure there are other cases where old fashioned divide and conquer are useful. The approach outlined could be useful IF you can efficiently deliver each data chunk to the parallel slaves without the database having to compute which slave needs to process each row - also lets you do other operations in parallel in other processes (but that's more just about naming the partition). Not the everyday case I guess, but they are ones I have heard over the past couple of years.
Cheers
David

hi --- thanks for sharing this info. .
I am trying to implement the parallel data load by following the technique mentioned by you.. the thing is it is taking more time than the stored procedure, cursor approach (provided the code as well) and when i ran the parallel process, it is not inserting any data... Really wasn't sure if i am missing any thing.... can you please let me know if the approach is correct or not... based on the o/p the select statement,which o/p's the name of the table , need to run the SQL with dynamic SQL and insert data into a single table... provided below was the proc... can you please point me to a direction if i am doing the parallel execute the right way?

Check the table USER_PARALLEL_EXECUTE_CHUNKS for errors, the column ERROR_MESSAGE will help you plus the STATUS column will indicate state of the job - more than likely PROCESSED_WITH_ERROR.

Your SQL query selects from group_table and your table definition is called TABLE_GROUP... You also need to bind table_name when executing the SQL in the execute immediate. ie. execute immediate s using table_name,vstart_id;

As for the performance, you will need to try on a realistic workload and compare plans to see the suitable implementation configuration to use.
Cheers
David