oracle-developer.net

partition outer join in oracle 10g

Partition outer join is a new mechanism in 10g to "invent" data to fill the gaps in non-contiguous results. In 10g there are many methods to deal with such a problem (including the awe-inspiring, but equally terrifying, MODEL clause). In older versions of Oracle, "data-densification" was not as simple and certainly less efficient than it has now become.

the problem

This article has been motivated by a response I gave to a problem raised on an Oracle developer forum. Our requirement is to produce a report that details customer spending for each month of the year. Our database only records actual spend, so for any given month, data for dormant or idle customers will have to be generated.

adding the missing months

We can see from the data that we are missing most months of the year for our two customers. Remember that our requirement is to show a report for every month in 2004 for every customer. First we will build a "time dimension" set (using subquery factoring) and outer join it to our orders table.

We can see that this hasn't quite worked. We have the zero sums and the year-months, but we are missing customer names. This is because we outer joined to CUSTOMER_ORDERS on the year-months, so any customer columns would show as NULL for deficient rows. Until PARTITION OUTER JOIN appeared in Oracle 10g, we couldn't "invent" data easily, though the next section shows that it is possible in prior versions.

data-densification without partition outer join

We can now take the previous example further and generate the missing customer names for our report. We need to create a set of distinct customer names in addition to the year-months that we generated by subquery. We can combine the two generated "dimensions" (name and year-month) to drive the report as follows.

We can see that this has now become quite complicated and unintuitive. Using the generated dimensions, we built a cartesian product of our range of months and customer names to derive the full range of dates per customer. We then outer joined this cartesian product to CUSTOMER_ORDERS to fill in the gaps for our report. We can imagine the potential cost of such a method if our CUSTOMER_ORDERS table had millions of rows and we had to generate a distinct set of names and then join this back to the table.

In the following section, we'll see just how much simpler the PARTITION OUTER JOIN solution is.

partition outer join

In 10g, the PARTITION OUTER JOIN enables us to avoid the second access of CUSTOMER_ORDERS, and makes the report much easier to build. We can see this with a re-write of the previous example.

This is so much easier! The PARTITION BY clause on the outer join has enabled us to split our deficient data into groups and outer join to each group. In the context of our example, this means that Oracle knows the customer name for each group that we are outer joining to. Hence we did not need to generate a separate set of names and this in turn keeps our resource usage to a minimum.

If we fetch the execution plan for the previous query using DBMS_XPLAN.DISPLAY_CURSOR, we can see the presence of the PARTITION OUTER JOIN as follows.

The effects of the PARTITION OUTER JOIN can be seen in steps 3 and 10. Step 10 is where Oracle sorts the customer data to partition by name and step 3 is where the partitions are outer joined to the year-month data. Note that the CONNECT BY steps are a result of the method we chose to generate the time dimension data.

further reading

For more information on PARTITION OUTER JOIN and how it can be applied, see the online documentation. Jonathan Gennick has written about this feature in a wider article on the MODEL clause in Oracle Magazine and this thread on Ask Tom shows the original forum problem.

source code

The source code for the examples in this article can be downloaded from here.