Oracle Database 10g brings a new feature called the "partition outer-join". You use the feature to "densify" data, to fill in missing rows in a result set. My first reaction was to dismiss the feature as trivial, but then I ran into a case where I needed it, and I found that, for being such a simple feature from a syntactic standpoint, it's actually quite useful, and elegant too.

NOTE! You'll understand this article better if you create the example tables and run the queries I talk about on your own systems where you can readily view all the output. Look towards the end of this email, under the heading "Example Data", for a script you can use to create and populate the example tables in your database.

For the September/October 2002 issue of Oracle Magazine, I wrote about a case in which I needed to report out rows that were missing from a table. If you can, read that article before proceeding with this one. (Unfortunately, my article is no longer online at Oracle's website). In my September 2002 article, I had the following two tables:

The fmr_facility_monthly_report table was to have one row per facility per month that the facility was open. My job was to discover any missing rows for the year 2000. I began by writing the pivot table query shown in LISTING 1 to return one row for the first of each month:

I then joined, using an inner join, the results of LISTING 1's query to the agr_agreement table. This was not an equi-join, but rather a range-based join based on the facility open and close dates contained within the agreement record. The result was a set of primary keys that should have existed in the fmr_facility_monthly_report table.

LISTING 2: GENERATE EXPECTED ROWS FOR EACH FACILITY
---------------------------------------------------
SELECT agr.fac_id, ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
FROM pivot p,
agr_agreement agr
WHERE (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
AND TO_DATE('01-Dec-2000','dd-mon-yyyy'))
AND (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
BETWEEN agr.agr_begin_date
AND NVL(agr.agr_end_date,ADD_MONTHS(SYSDATE,-1)));

My last step was to extend LISTING 2's query into a union query, using MINUS to subtract primary keys for existing rows, leaving only those primary keys that represented missing rows. LISTING 3 shows my final solution.

LISTING 3: REPORT THE *MISSING* ROWS
------------------------------------
SELECT agr.fac_id, ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
FROM pivot p,
agr_agreement agr
WHERE (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
AND TO_DATE('01-Dec-2000','dd-mon-yyyy'))
AND (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
BETWEEN agr.agr_begin_date
AND NVL(agr.agr_end_date,ADD_MONTHS(SYSDATE,-1)))
MINUS
SELECT fmr.fac_id, fmr_period
FROM fmr_facility_monthly_report fmr
WHERE fmr_period BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
AND TO_DATE('01-Dec-2000','dd-mon-yyyy');

The solution in LISTING 3 works, and actually works rather well, largely because of the agr_agreement table. Because agr_agreement contained open and close dates for each facility, it was possible to generate the list of candidate monthly report records via an *inner* join to the pivot table.

Let's make this problem more difficult. Let's pretend that the agr_agreement table simply doesn't exist. The fmr_facility_monthly_report table should have one row for each facility, for each month during 2000. We need to write a query that lists any missing monthly reports from that year. Let's start easy. Let's just list the reports that we do have, using the query in LISTING 4.

The results from LISTING 4 aren't terribly interesting. We know we need to generate rows somehow, else how can we report on the rows that do not exist. The solution to the current problem still involves using a pivot table to generate those rows. Listing 5 shows a subquery that uses a pivot table to generate one row for the first of each month of the year 2000.

The join in LISTING 5 is an inner join. No new rows are created. The query simply returns the existing rows from fmr_facility_monthly_report, with an additional column from the join.

We can make things more interesting. Our subquery returns one row per month. Let's convert our inner join to an outer join, so that our output data also has one row per month. Listing 6 shows this new version of the query, which is interesting enough that I've also included the resulting output.

Look closely at the output from this query. We do indeed get one row for each month in the year 2000. Sometimes we even get more than one row per month. There are two rows each for July through November. These results are no help at all! Facility 101, for example, is missing reports for March and December, but these query results simply don't tell us that. Likewise, facility 202 is missing reports for January through June, and that information isn't even hinted at in these results.

The fundamental problem with the non-solution in LISTING 6 is that the fmr_facility_monthly_report table has a two-part primary key: fac_id and fmr_period, while the rows generated by our subquery return only one part of that key. In order to find the missing rows for the year 2000, our subquery needs to return every combination of fac_id and month-begin date for that year. I had to think about it a bit, but there is a solution to this problem that works before Oracle Database 10g. That solution, and it's an ugly one, is shown in LISTING 7.

If you stare at the subquery in LISTING 7 long enough, you'll see that it generates one row for each facility for each month during the year 2000. Those results are then used as the non-optional table in an outer join to fmr_facility_monthly_report. The resulting output makes very clear which monthly reports are missing from the fmr_facility_monthly_report table.

Oh, but the query in LISTING 7 is inscrutable. It took me awhile to come up with it. And it's bothersome that it depends on a sub-sub-query that performs a potentially expensive SELECT DISTINCT operation against the ever-growing detail table fmr_facility_monthly_report.

The new, partition outer-join feature in Oracle Database 10g provides a much simpler solution. Begin with the query from LISTING 6, and add the single line shown in LISTING 8, which is marked by the comment /*** NEW ***/.

Preceding the RIGHT OUTER JOIN with the clause PARTITION BY (fmr.fac_id) causes the outer-join to be performed separately for each distinct set of fac_id values. Thus, using my example data for this article, the join is performed once for facility 101, and again for facility 202. As with the LISTING 7 solution, the final results shown in LISTING 8 clearly indicate which monthly reports are missing from the fmr_facility_monthly_report detail table.

I made one change between LISTING 6 and LISTING 8 that I haven't discussed yet. I changed the second ORDER BY column from fmr.fac_id to range.report_date. I did this only to sort the output in the same order as that from LISTING 7, so you could more easily see that both queries return the same data.

At first, I thought partition outer-joins were mere syntactic sugar; that the new syntax did nothing more than provide us with an easier and clearer way to write a query such as I've described in this article. But there's a bit more here than just a bit of elegant syntax. In the example I've just worked through, the new syntax not only provides clarity in the way the query is written, but it also provides a significant boost in performance. LISTING 9 and LISTING 10 show the execution plans for the queries from LISTING 7 and LISTING 8 respectively.

Look at how much simple the partition outer-join's execution plan is. Look too at the optimizer's estimated cost. It's 10 for the LISTING 7 solution, and only 7 for the partition outer-join solution from LISTING 8. Indeed, when executing these two queries on my lab system, I consistently see the LISTING 7 query executing 21 consistent gets, whereas the partition outer-join query requires only 14 consistent gets. That's a 1/3 reduction in I/O from a feature I originally thought to be eye-candy. Not bad Oracle, not bad.