oracle-developer.net

listagg function in 11g release 2

This article describes the new LISTAGG function in Oracle 11g Release 2. LISTAGG is a built-in function that enables us to perform string aggregation natively. String aggregation is a popular technique, and there are several methods available on the web, so we will compare their performance to the new LISTAGG function later in this article.

sample data

For reference, we will use the following sample data for our examples.

We can see that the employee names have simply been grouped and concatenated into a single column (values are delimited by comma) per group. As stated, there are several techniques available to perform this aggregation (references are provided at the end of this article), but the new LISTAGG function makes it much easier, as we will see below.

listagg syntax overview

The LISTAGG function has the following syntax structure:

LISTAGG( [,]) WITHIN GROUP (ORDER BY ) [OVER (PARTITION BY )]

LISTAGG is an aggregate function that can optionally be used as an analytic (i.e. the optional OVER() clause). The following elements are mandatory:

the column or expression to be aggregated;

the WITHIN GROUP keywords;

the ORDER BY clause within the grouping.

We will now see some examples of the function below.

listagg as an aggregate function

We will begin with a simple example that aggregates the employee names for each department in the EMP table, using a comma as delimiter.

SQL> SELECT deptno
2 , LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
3 FROM emp
4 GROUP BY
5 deptno;

Note that we chose to order the employees within each aggregation by the employee name. It should be noted that ordering the elements of a string aggregation is not a trivial task in some of the alternative techniques to LISTAGG.

In the following example, we will aggregate the employee names but order them by their respective hire dates.

SQL> SELECT deptno
2 , LISTAGG(ename, ',') WITHIN GROUP (ORDER BY hiredate) AS employees
3 FROM emp
4 GROUP BY
5 deptno;

We can see that the order of employee names within each group differs from the first example.

listagg as an analytic function

As with many aggregate functions, LISTAGG can be converted to an analytic function by adding the OVER() clause. The following example demonstrates the analytic equivalent of our previous example (for each department, aggregate employee names in hire date order).

Remember that when use an analytic function, we do not lose any rows from our resultset: rather we see the results of the aggregation on every source row (depending on the analytic function and the ordering/windowing clauses). We can clearly see this above. The string aggregation for each department is available on every corresponding row.

more on ordering

As stated earlier, the ORDER BY clause is mandatory, as following example demonstrates.

SQL> SELECT deptno
2 , LISTAGG(ename, ',') WITHIN GROUP () AS employees
3 FROM emp
4 GROUP BY
5 deptno;

, LISTAGG(ename, ',') WITHIN GROUP () AS employees
*
ERROR at line 2:
ORA-30491: missing ORDER BY clause

If the order of the aggregated elements is irrelevant to us, we can use a constant expression such as NULL, as follows.

SQL> SELECT deptno
2 , LISTAGG(ename, ',') WITHIN GROUP (ORDER BY NULL) AS employees
3 FROM emp
4 GROUP BY
5 deptno;

In this example, the elements have been aggregated alphabetically, despite the NULL ordering clause. This appears to be the default behaviour when using a constant ORDER BY expression such as the above.

delimiters

We can use a range of constants or expressions as a delimiter for our aggregated strings. In fact, the delimiter is optional and can be excluded altogether, as the following example demonstrates.

SQL> SELECT deptno
2 , LISTAGG(ename) WITHIN GROUP (ORDER BY ename) AS employees
3 FROM emp
4 GROUP BY
5 deptno;

One restriction is that delimiters have to be either constants (such as a literal) or based on a deterministic expression that includes a column or expression used in the grouping. For example, we cannot use ROWNUM as a delimiter, as we see below.

, LISTAGG(ename, '(' || ROWNUM || ')')
*
ERROR at line 2:
ORA-30497: Argument should be a constant or a function of expressions in GROUP BY.

The error message is clear: ROWNUM is neither a constant nor an expression that involves our grouping column (i.e. DEPTNO). If we try to use our grouping column, we are limited to the type of expression we can use, as we can see below.

This time, Oracle recognises that we tried to use our grouping column, but we did not use a valid expression (hence we receive another error message; albeit different to the previous example). The following is an example of a deterministic expression on our grouping column that Oracle will accept.

This is a contrived example, but we've simply converted the DEPTNO into an ASCII character to use as a delimiter. This is a deterministic expression on the value of the grouping column and Oracle allows it.

other restrictions

The results of LISTAGG are constrained to the maximum size of VARCHAR2 in SQL (i.e. 4000) as we can see below.

SQL> SELECT LISTAGG(object_name) WITHIN GROUP (ORDER BY NULL)
2 FROM all_objects;

FROM all_objects
*
ERROR at line 2:
ORA-01489: result of string concatenation is too long

There is no CLOB or larger VARCHAR2 equivalent, so for larger strings we would need to use an alternative means of gathering the elements (such as a collection or a user-defined PL/SQL function).

performance considerations

We will compare the performance of LISTAGG to some of the alternatives that pre-date it. The methods we will compare are as follows:

LISTAGG (11g Release 2);

COLLECT + PL/SQL function(10g);

Oracle Data Cartridge - user-defined aggregate function (9i)

MODEL SQL (10g).

The main difference with LISTAGG is that it is a built-in function, so we should expect its performance to be at the very least comparable to its alternatives.

setup

For the performance comparisons, we will use a larger source dataset of 1 million rows spread evenly between 2000 groups, as follows.

We will compare LISTAGG to the alternatives using both Autotrace and the wall-clock. Note that the sample data is cached prior to the comparisons. We will begin by preparing our environment as follows.

SQL> set autotrace traceonly statistics
SQL> set timing on
SQL> set arrays 500

listagg

Our first test will be with LISTAGG. We will aggregate and order the value string within each of our 2000 groups, as follows.

SQL> SELECT grp
2 , LISTAGG(val, ',') WITHIN GROUP (ORDER BY val) AS vals
3 FROM t
4 GROUP BY
5 grp;

This executes in just under 6 seconds on the test database, with no physical I/O and all sorting in memory.

stragg/wm_concat

Next we will use the best known technique for string aggregation: Tom Kyte's user-defined aggregate function known as STRAGG. In 10g, Oracle implemented a similar function in the WMSYS schema, so we will assume this to be a direct equivalent of STRAGG and use it in our test below. Note that one of the details of the STRAGG method is that it doesn't enable the strings to be ordered.

We can see that this is over 3 times slower than LISTAGG (even without element ordering). The user-defined aggregate function carries with it the general overhead of PL/SQL functions (i.e. context-switching) and we can see the impact of this in the results above.

collect (without ordering)

When 10g was released, I devised a quicker alternative to STRAGG using the COLLECT function combined with a "collection-to-string" PL/SQL function (described in this oracle-developer.net article). The 10g version of COLLECT had no ordering facility so we'll start with this variant, as follows. Note that the TO_STRING source code is available in the referenced article.

Without sorting the elements, the original COLLECT/TO_STRING method is twice as fast as LISTAGG. However, this isn't a fair comparison because LISTAGG always incurs the cost of ordering the elements. If the order of the elements is irrelevant, however, the original COLLECT technique will be the quickest.

collect (with ordering)

For a fair comparison between COLLECT and LISTAGG, we must include the sorting of the elements in the COLLECT call (an 11g Release 1 new feature described in this oracle-developer.net article). The following is equivalent to our LISTAGG example.

This time, when we include the ordering of the elements, the COLLECT method is actually slower than the new LISTAGG function.

model

Our final performance comparison is with an example of string aggregation that uses the MODEL clause. The following example was sourced from Rob van Wijk's About Oracle blog and has been modified to fit our sample data.

This ran for over 3 minutes! We can see from the statistics that it has incurred a significant number of physical reads, recursive calls and in-memory sorting. In fact, this poor timing is largely due to an enormous number of direct path reads/writes to/from the temporary tablespace during query execution (although the disk sorts are not reported by the Autotrace statistics).

According to a SQL monitor report (using DBMS_SQLTUNE.REPORT_SQL_MONITOR), the ordering of data within the SQL MODEL ORDERED operation at step 3 accounted for almost 4Gb of temporary space! An alternative example from Gary Myers' Sydney Oracle Lab blog displays similar performance characteristics.

performance summary

To summarise, therefore, the new LISTAGG function is the fastest technique for string aggregation and has the additional benefit of being a simple built-in function. If the ordering of elements is irrelevant, a technique that uses COLLECT without sorting is quicker still, but for ordered string aggregation, LISTAGG is unbeatable. Note that there is an issue with the MODEL examples used in the performance comparisons, but this might be isolated to either the version of Oracle used for the examples (11.2.0.1 Windows 64-bit) or the test database itself.

further reading

For more information on LISTAGG, see the online SQL Language Reference. For a good summary of common string-aggregation techniques, see this article by Tim Hall and this article by William Robertson. As referenced earlier in this article, oracle-developer.net articles on the COLLECT function can be found here and here. For more information on using MODEL for string aggregation, see blog entries by Gary Myers and Rob van Wijk.

source code

The source code for the examples in this article can be downloaded from here.