oracle-developer.net

collect enhancements in 11g release 2

The COLLECT aggregate function was introduced in Oracle 10g and enables us to aggregate data into a collection in SQL. Oracle 11g Release 2 includes two enhancements to this function: the ability to order and de-duplicate data as it is aggregated. In this article, we will examine these new features and demonstrate a way to replicate their functionality in 10g and 11g Release 1.

a brief reminder

Before we demonstrate the COLLECT function's 11.2 new features, we will remind ourselves of its original 10g functionality. We will begin by querying the EMP table and aggregating the employee names into a collection for each department, as follows.

Using COLLECT, we have aggregated a single collection per department as required but the nested table type is system-generated and not very usable. By using our own collection type, we can make this much more usable and flexible. First, we will create a general collection type, as follows.

SQL> CREATE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
2 /

Type created.

We will now repeat our query but this time CAST the system-generated collection type to our own user-defined type, as follows.

Having cast the collection in this way, it is now available to use throughout our application (i.e. wherever a parameter, variable or column of VARCHAR2_NTT is required). This summarises, in general, how we use the COLLECT function.

This is a short reminder of the COLLECT function's 10g usage. Readers who are not familiar with the function are advised to read this oracle-developer.net article for further discussion and detail on the concepts described above.

new feature: ordering collection elements

Sometimes we might wish to populate our collection with elements in a specific order. From 11g Release 2, COLLECT officially supports the ordering of elements during aggregation. Note that we use the term "officially supports" because this has been a hidden (and therefore unsupported) feature of COLLECT since its introduction in 10g Release 1. It is only with the release of 11g Release 2 that this option has been documented.

With this in mind, in the following example we will collect the employee names in alphabetical order by using the new extended syntax.

We can see that the collection elements are now in ENAME order. Of course, we don't need to order the collection according to the element we are collecting. We can order our set using other columns or expressions. In the following example, we will collect and order the employees according to their hiredates.

new feature: distinct collection elements

In addition to ordered elements, we can also request distinct collection elements in 11.2 (note that this option is also available unsupported in earlier releases). Before we demonstrate this extension, we will populate non-unique collections of jobs from our EMP table for reference, as follows.

Collecting all elements is, of course, the default behaviour and is implicit in the same way that SELECT ALL is implied in SQL.

combining both new features

The syntax diagram in the documentation suggests that both extensions can be combined to build ordered, distinct collections for each group. However, in version 11.2.0.1, this doesn't appear to work. In the following query, we will attempt to collect the distinct jobs in each department and then order them.

We can see duplicate elements in our ordered collections so something is clearly not working as of 11.2.0.1.

using complex types

Our examples so far have used a simple scalar collection type. We can also use COLLECT to aggregate collections of more complex object types and perform the new operations on them. We will see an example of this below. First we will create an object type based on a couple of the attributes of the EMP table, as follows.

DISTINCT empsal_ot(ename, sal)
*
ERROR at line 4:
ORA-22950: cannot ORDER objects without MAP or ORDER method

This makes sense as we cannot expect Oracle to be able to order or compare multi-attribute instances of EMPSAL_OT unless we define the sort criteria. A MAP method (or alternatively an ORDER method) is required to tell Oracle how to sort or compare data of this type. We will therefore remove our example types and start again, as follows.

Note the MAP method definition in our new EMPSAL_OT type specification. It returns a simple scalar value to represent a comparison key for the data held in the type. We will define the sort key in the type body below.

We can see that our collection has been de-duplicated and an object instance for 'ALLEN' appears just once in our collection.

emulating collect new features before 11g release 2

As noted earlier, the ordering or de-duplication of elements during a COLLECT function call has been available (but unsupported) since the function was first released. Both extensions can be emulated quite simply in all versions since 10g Release 1. We will see some examples below that use supported methods only.

ordering collection elements before 11.2

To order collection elements in a deterministic way, we can write a simple function to sort the collection once it has been populated, as follows.

Our function is very simple. It uses the TABLE() operator to cast our collection to a rowsource, which we then sort using a SQL ORDER BY clause and finally BULK COLLECT into a new collection. Using the function is also very simple. In the following example, we will populate and order a collection of employee names using COLLECT and our SORT_COLLECTION function.

Of course, this method doesn't completely emulate the COLLECT ordering feature. Remember that the COLLECT function can order a collection using columns/expressions that are not being collected. To achieve the same prior to 11.2, we can use the MULTISET function. In the following example, we will populate collections of enames ordered by their hire dates. First, we will see the correct order using the new COLLECT feature below.

The ORDER BY clause in the MULTISET subquery enables us to order the inputs to each collection and we can see that they are correctly ordered according to hire dates. However, with this method we have two EMP accesses which will naturally lead to a greater workload. The Cost Based Optimiser is smart enough, however, to avoid executing the MULTISET subquery for every EMP record in the outer query block (to verify this, repeat the query with statistics level set to ALL, query DBMS_XPLAN.DISPLAY_CURSOR with the format option "IOSTATS" and look at the number of Starts for the subquery, which will be equivalent to the number of distinct departments).

To measure the additional workload that this technique generates, we will compare the COLLECT and MULTISET methods with a larger dataset. Using the ALL_OBJECTS view, we will populate collections of OBJECT_NAME by OWNER, with the collection elements ordered by OBJECT_ID. We will compare their performance using Autotrace statistics and a wall-clock (having queried ALL_OBJECTS several times beforehand to eliminate physical I/O interference).

As expected, the MULTISET method uses significantly more resources (particularly LIOs and sorts) to answer the same query, which naturally leads to a longer elapsed time (almost 40% longer in the example above).

de-duplicating collections before 11.2

To emulate the new distinct extension to COLLECT, we can use the MULTISET functions and operators introduced in 10g. Note that these will only work with collections of nested table types (i.e. not VARRAYs), but they are simple to use. Furthermore, we need a MAP or ORDER method to use these with complex collections, as demonstrated by our earlier examples. For simplicity, however, we will use our simple scalar collection type for the following examples.

First, we can use the SET function to de-duplicate a collection, as below.

Note the placement of the SET function. It must be outside the CAST call, because SET will not work with Oracle's system-generated types that are created by our use of the COLLECT function. We will demonstrate this below.

We have performed a distinct union on our collection by simply appending an empty collection of the same type with the MULTISET UNION DISTINCT operator. This is the semantic equivalent of a SQL UNION.

Using the ALL_OBJECTS dataset and a wall-clock, we will compare the performance of the two MULTISET techniques and the 11.2 COLLECT function below. We will begin by enabling Autotrace to suppress the resultset and use the new COLLECT function, as follows.

After just 7 seconds and the same resource profile as the previous queries, our MULTISET UNION DISTINCT method gives up with an ORA-22813 exception due to collection size (note that this was tested on an 11.1.0.7 database). We can therefore conclude that for collection de-duplication, the new COLLECT function is more efficient than the pre-11g Release 2 alternatives (and that the alternatives will not necessarily be appropriate when working with larger collections).

acknowledgements

Thanks to Jonathan Heller for suggesting the use of the SET function to emulate the new distinct feature in versions prior to 11g Release 2.