In a previous article I showed a simple method to convert values found in multiple table rows into a comma delimited list. The method works very well, until the situation in which the approach is to be used becomes a bit more complex.

Assume for a moment that the following table structure exists where the approach must be implemented:

In the above diagram, the PURCHASE_ORDER table is the parent table, the PURC_ORDER_LINE table is the child table (the line-level detail), and the PURC_LINE_DEL table is the grand-child table (the sub-line level detail). A fairly simple setup, and I suspect not a too terribly uncommon setup.

Now the complexity. Assume that an ERP package using the above tables generates a report using the tables’ data, and as the ERP package is flexible, it allows additional information to be added to the reports – essentially, it is possible to execute a query that retrieves a single row for every detail-level row in the report. Great, except that in this particular report, the detail-level row level is at the PURC_ORDER_LINE table. If there were 18 rows, for instance, in the PURC_LINE_DEL table (the schedule that indicates when the parts should be delivered) for the first order line in the PURC_ORDER_LINE table for a particular order ID that is found in the PURCHASE_ORDER table, then we have a bit of a problem. Fortunately, this ERP package’s report was constructed such that the DESIRED_RECV_DATE and USER_ORDER_QTY columns are represented in two report variables with CRLF (carriage return – line feed) delimited lists, making it very easy to list the delivery dates that the parts are due, and the quantities that are due for each of those dates. Fantastic, so where is the complexity? What if the purchase order must be revised after the supplier starts sending the parts on the agreed upon delivery dates, such that the first 9 of the 18 delivery schedules for line 1 of a purchase order have been satisfied, 1 of the 18 delivery schedule lines has been partially satisfied, and the remaining 8 delivery schedule lines have dates that are in the future? This might be helpful information to provide to the supplier when presenting the updated purchase order. Unfortunately, the ERP package’s report does not deliver the USER_RECEIVED_QTY column as a CRLF delimited list, as it does for the DESIRED_RECV_DATE and USER_ORDER_QTY columns.

A complex problem, or not? Well, the ERP package does permit a SQL statement returning a single row to be executed per detail row in the report (at the PURC_ORDER_LINE table level), so we have a possible solution. However, if the SQL statement is too complex (having inline views is one such complication), the complex portion of the SQL statement must be isolated in a statically defined view, and then that view should be referenced by the report. A complex problem, or not? Let’s try building a solution starting in SQL*Plus:

Simple so far, that line identified by RN 10 requests an order quantity of 3, but the supplier only delivered 2 of the 3 parts for that date. Now let’s generate a comma separated list using the SYS_CONNECT_BY_PATH function:

Looks like mission accomplished, all of the previously received quantities are returned in a single query row… except now we have a complicated query that must be isolated in a statically defined view (silly rules…).

Before we head down the path of isolating the complex portion of the query, I thought that it might be entertaining to examine a couple of execution plan to see what happens when we isolate the complicated portion of the query in a statically defined view. First, we will take a look at the last of the above SQL statements:

The above looks simple, and the query really does execute quickly. For the above complex portion of the query to work in a statically defined view, we need to relocate the WHERE clause outside of the inline view, so that when the report runs, it is able to pass in the appropriate values for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns. Here is the execution plan (generated on Oracle Database 11.2.0.2) when the WHERE clause is relocated:

The above looks to be reasonably sane, although the predicted amount of time jumped from about 1 second to about 5 seconds. The Predicate Information section of the execution plan indicates that the filter for the specified PURC_ORDER_ID and PURC_ORDER_LINE_NO was not pushed into the inline view as I had hoped. The actual execution time for this simple query is much longer than 5 seconds, spending 100% of its time burning CPU time (I killed the query after waiting several minutes). No problem, I will just use a hint to force the optimizer to push those predicates into the inline view:

Well, the above is disappointing. I could dig through a 10053 trace to try to determine why the hint was “ignored“, or I could try another query that produces an equivalent result. I choose the latter. Let’s try using the COUNT analytic function to replace the ROW_NUMBER analytic function:

As had happened before, the predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were not pushed into the inline view, so the query executes extremely slow. Let’s try the PUSH_PRED hint again:

I remembered reading about a new analytic function introduced in Oracle Database 11.2.0.1, what was that new function… LISTAGG? I am not a fan of that function name, Oracle should have named the function __CREATE_A_DARN_COMMA_SEPARATED_LIST_QUICKLY (a double underscore function, making it a challenge to find the function). The LISTAGG function is described in the Oracle documentation library for 11.2. Let’s give the LISTAGG function a try (the SQL statement is also now formatting the USER_RECEIVED_QTY so that the column values consistently show 2 decimal places):

The calculated cost for the query dropped by 1 and the CPU component of that cost dropped to 0%, so that must be a good thing (the calculated cost can be misleading, just as we saw earlier that the estimated time can be misleading). Without AUTOTRACE TRACEONLY EXPLAIN enabled, the query executes very quickly and produces output that looks similar to this:

In the above, notice that the predicates for the PURC_ORDER_ID and PURC_ORDER_LINE_NO columns were pushed into the inline view, or more correctly, the inline view was completely eliminated by the query optimizer (notice that there is no line in the execution plan showing VIEW as we saw in the previous examples above).

OK, so Oracle’s query optimizer worked as I had hoped with the inline view, but what about a statically defined view – would it also execute the SQL statement efficiently? Let’s create the view:

CREATE VIEW TEST_LA AS
SELECT
PURC_ORDER_ID,
PURC_ORDER_LINE_NO,
LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO) LISTING
FROM
PURC_LINE_DEL
GROUP BY
PURC_ORDER_ID,
PURC_ORDER_LINE_NO;

The execution plan above shows the same cost (3) that we saw earlier, and once again the view was eliminated by Oracle’s query optimizer (as indicated by the absence of a VIEW line in the execution plan), allowing the query to execute quickly since the predicates would be applied early in the query execution.

—

So, Oracle Database 11.2.0.1 and above provide an efficient way to retrieve the information that we need in a comma (or CRLF) delimited list for the report. When the SQL statement is added to the report (defined as an “extended query” in the ERP package), the SQL statement looks like the following (the reporting package does not permit line breaks in the SQL statement):

SELECT (LISTAGG(TO_CHAR(USER_RECEIVED_QTY,'999,990.00'), CHR(13)||CHR(10)) WITHIN GROUP (ORDER BY DEL_SCHED_LINE_NO)) FROM PURC_LINE_DEL WHERE PURC_ORDER_ID= : PO_ID AND PURC_ORDER_LINE_NO= :LN_LINE_NO GROUP BY PURC_ORDER_ID , PURC_ORDER_LINE_NO

Oddly enough, by enclosing the LISTAGG analytic funtion in the SQL statement with ( ), the reporting package did not require me to use a statically defined view for the complex portion of the query. But even if the statically defined view was required, as shown above, the query would still have executed efficiently.

Let me first clarify that I am not trying to argue against the use of LISTAGG function. I love it and I would myself use it whenever possible. It just simplifies many things.
However, I guess there are a couple of things I would like to point out
1) The PUSH_PRED hint is applicable to only join predicates (why didn’t they call it PUSH_JOIN_PRED to avoid confusion?) as mentioned here. So it won’t make any difference to your query.
2) When you use SYS_CONNECT_BY_PATH to aggregate the string, you are effectively using hierarchical query feature of Oracle. One of the “features” of hierarchical query
is oracle applies any WHERE predicates AFTER it builds the entire hierarchy (using the CONNECT BY…START WITH clauses). This, combined with merging (or not) of view
in the main query, causes the WHERE predicate to be applied quite late in the query execution.
Following is my test-case which explains (??) the above point (unless, of course, if I have missed something obvious)

Oracle processes hierarchical queries as follows:
•A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.
•The CONNECT BY condition is evaluated.
•Any remaining WHERE clause predicates are evaluated.

Oracle then uses the information from these evaluations to form the hierarchy using the following steps:
1.Oracle selects the root row(s) of the hierarchy—those rows that satisfy the START WITH condition.
2.Oracle selects the child rows of each root row. Each child row must satisfy the condition of the CONNECT BY condition with respect to one of the root rows.
3.Oracle selects successive generations of child rows. Oracle first selects the children of the rows returned in step 2, and then the children of those children, and so on. Oracle always selects children by evaluating the CONNECT BY condition with respect to a current parent row.
4.If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.
5.Oracle returns the rows in the order shown in Figure 9-1. In the diagram, children appear below their parents. For an explanation of hierarchical trees, see Figure 2-1, “Hierarchical Tree”.

It appears that your two points are valid. Thank you for the correction.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: