oracle-developer.net

setting cardinality for pipelined and table functions

Table and pipelined functions are often used as simple rowsources (for example, SELECT * FROM TABLE(dbms_xplan.display)). In these cases, cardinality is not particularly useful as there is only one possible execution plan. However, there might be times when we need to join a pipelined or table function to a set of other rowsources (tables, views etc). At this point having cardinality statistics for the function can be critical to enable the CBO to derive an efficient execution plan. This is what we will focus on in this article by demonstrating several methods for setting cardinality statistics for pipelined and table functions.

methods for setting cardinality

We will cover the following methods for setting table/pipelined function cardinality in this article:

CARDINALITY hint (9i+) undocumented;

OPT_ESTIMATE hint (10g+) undocumented;

DYNAMIC_SAMPLING hint (11.1.0.7+);

Extensible Optimiser (10g+).

Note that the first two methods use undocumented hints. These hints are used internally by Oracle but they haven't been included in the public documentation. For this reason, they should not be used in production code. They are included in this article for full coverage of this subject and not as recommended solutions.

The examples in this article were run on an 11.1.0.7 database, but the version information above indicates the Oracle versions that support each method.

sample pipelined function

The majority of the examples in this article will be based on a join between the DEPARTMENTS and EMPLOYEES data in the supplied HR schema. The EMPLOYEES data will be generated by a pipelined function as an alternative rowsource to use in the queries. We will begin by creating the object and collection types to define our pipelined function's record structure and return type, as follows.

Note that this function pipes over 20,000 records by inflating the EMPLOYEES dataset by a factor of 200 (this volume of data will show the impact of using correct cardinality statistics more clearly). The duplicate data and the fact that there are no complex transformations (which is usually the reason for using a pipelined function in the first place) is irrelevant in this case as we simply need a representative rowsource for our examples.

default cardinality

We will begin by investigating how the CBO treats pipelined functions when optimising SQL. First we will query our function to see how many rows it actually generates, as follows.

SQL> SELECT COUNT(*)
2 FROM TABLE(employees_piped);

COUNT(*)
----------
21400
1 row selected.

Our function pipes 21,400 employees records. Using Autotrace to generate an Explain Plan, we will re-query the function to see how many rows the CBO thinks it returns.

We can see that the CBO uses a cardinality of 8,168 rows for our pipelined function. This is a heuristic value (rather than an estimate) that the optimiser uses for all table/pipelined functions. This default value appears to be dependant on the default database block size (this example is from a database with an 8kb block size). On databases with larger default block sizes, the CBO will use a larger heuristic value for the cardinality.

Given that the optimiser uses a default (and usually incorrect) value for cardinality, it follows that this could lead to sub-optimal execution plans. We will test this assertion by joining our EMPLOYEES_PIPED function to the DEPARTMENTS table, as follows.

With a cardinality of 8,168 for the pipelined function, the optimiser has chosen a sort-merge join. It is unlikely that this is the most efficient method for joining these two rowsources, so we will now demonstrate the effect of setting correct cardinality statistics for our pipelined function, using each of the four available methods in turn.

method 1: cardinality hint (undocumented)

The CARDINALITY hint is simple to use (although quite limited) yet for some reason remains undocumented. As its name suggests, this hint enables us to supply the cardinality for a rowsource in a query. Despite the fact that we shouldn't use this hint in production code, we will see the effect it has on our query below. We will supply the exact rowcount of our pipelined function in this case, though in practice we would probably only ever use an estimated value.

Using the hint, we have supplied the exact cardinality of our pipelined function to the CBO. With a more suitable cardinality, the optimiser has chosen an alternative access path (a hash join in this case). This clearly shows the importance of supplying representative statistics, rather than relying on defaults.

method 2: opt_estimate hint (undocumented)

The second method uses the OPT_ESTIMATE hint. This hint, although undocumented, is used by Oracle's SQL Profile feature to correct the cardinalities used by the optimiser. The OPT_ESTIMATE hint is used in SQL Profiles to scale the CBO's estimates up or down according to the correction factors it determined during a SQL Tuning Task (note that during a SQL Tuning Task, Oracle compares actual rowsource cardinalities for a query with the estimates it used to derive the original plan. A SQL Profile stores the corrections).

The OPT_ESTIMATE hint can be used to re-factor the cardinalities for tables (this includes views and table functions), indexes and joins. We will use it to correct our pipelined function cardinality to its exact rowcount, as follows.

With this hint, we have told the CBO to scale its cardinality estimates for our pipelined function by a factor of 2.62 (this is 21400/8168 to 2 decimal places but is probably more precise than we needed to be). Note that we defined this rowsource as a "table" and referenced it by its alias in the query. As we can see, the result is that Oracle uses a hash join, as it did when we used the CARDINALITY hint above. This is to be expected of course, because we have supplied the same information to the optimiser, albeit by a different method.

method 3: dynamic_sampling hint

The dynamic sampling feature has been available since 9i Release 2 but only supports pipelined and table functions from 11.1.0.7 onwards (at sampling levels 2 to 10). Note that the CBO will only sample a pipelined or table function if dynamic sampling is requested by using the DYNAMIC_SAMPLING hint and not with the OPTIMIZER_DYNAMIC_SAMPLING parameter setting. With this in mind, we will use the hint in our standard query, as follows.

The notes section of the DBMS_XPLAN output tells us that dynamic sampling was used for our query. We used a sampling level of 5 and interestingly the CBO has the exact cardinality for our pipelined function (and hence it switches to a hash join again). If we repeat the query with a dynamic sampling level of just 2 and trace the CBO using the 10053 event, we can see quite clearly how the CBO can be so accurate at lower sampling levels.

As highlighted above, dynamic sampling for pipelined functions at level 2 or above means that a 100% sample of the rowsource is taken, regardless of the sampling level specified in the hint. For pipelined functions with small resultsets, this 100% sample might be trivial compared with the effect that the wrong cardinality could have on the overall execution plan.

For pipelined functions that process large volumes of data, however, dynamic sampling could be more costly because it increases the workload during parsing. Despite this, it doesn't mean that the workload (and therefore time to completion) is doubled just because the function is queried twice. We can use SQL Trace to demonstrate this, by measuring the additional workload that dynamic sampling generates. In the following example, we will query our pipelined function with SQL Trace set and a dynamic sampling level of 2.

We can see that this query completed extremely quickly (this was due to having Autotrace set to traceonly so that the resultset wasn't spooled to screen and a SQL*Plus arraysize of 500). The tracefile also contains a second query of our pipelined function, as follows.

This recursive query was executed by Oracle because we used the DYNAMIC_SAMPLING hint. We can see that it took less time and fewer resources than our actual query, yet it accessed the same number of rows. The workload was reduced because Oracle was simply trying to get the pipelined function's rowcount. This means that it was able to use an aggregate query without projecting any of the function's return attributes. Instead, it used bind variable expressions. The raw trace file tells us that the values of the bind variables were 0 for the default NVL arguments in the outer query block and 1 for the C1 and C2 expressions in the in-line view. By using an in-line view, SUM and NVL, Oracle executed a more complicated version of a COUNT(*) from the function to determine its rowcount.

method 4: extensible optimiser

So far in this article, we have seen three hints that can be used to ensure the CBO has accurate cardinalities for our table or pipelined functions. However, two of these hints are unsupported and the third is potentially costly. Since 10g Release 1, however, there has been a fully supported and lightweight method for supplying cardinality to the optimiser; this being the Extensible Optimiser.

The Extensible Optimiser is implemented by an Oracle Data Cartridge (which is essentially an object type, known as an interface type, that contains one or more well-defined and structured methods). This feature enables us to design our own cardinality calculations (as a prescribed method in the interface type) and then associate them with our table or pipelined functions. The type's cardinality method is invoked by the CBO during query optimisation to determine the rowcount for the pipelined or table function. We will see examples of this below.

enabling the extensible optimiser

The Extensible Optimiser has a range of well-defined methods for calculating various statistics for functions, but the one we are interested in is the ODCIStatsTableFunction method. To use the Extensible Optimiser with a table or pipelined function, we require three components:

Line 3: object types must have at least one attribute, even if it is not needed in the implementation;

Lines 5, 9: these method names are prescribed by Oracle and we must use them. There are a number of different methods that we can use for the Extensible Optimiser. In our case, we are only using the method for table function cardinality (ODCIStatsTableFunction). The ODCIGetInterfaces method is mandatory for all interface types;

Lines 10-13: the parameter positions and types for ODCIStatsTableFunction are also prescribed by Oracle. There is one exception to this. Note the highlighted line after the P_ARGS parameter on line 12. Here, we must include all of the parameters of our associated table or pipelined function(s). In the case of EMPLOYEES_PIPED, the only parameter is P_NUM_ROWS, which we have included in our method as required, but interface type methods can cater for more than one user-parameter if required.

The interface type body is where we code our cardinality calculation, as follows.

As stated earlier, the ODCIGetInterfaces method is mandatory and so is its implementation, as shown. The ODCIStatsTableFunction is the method where we can be creative, although in fact our implementation is very simple. Remember that we included a P_NUM_ROWS parameter in our pipelined function. We didn't use it in the function itself. Instead, we have simply taken this parameter and passed it straight through to the CBO via the interface type, as highlighted above (on line 20).

(3) association

The interface type is the bridge between the table or pipelined function and the CBO. The ODCIStatsTableFunction method simply picks up the parameter we pass to our pipelined function, optionally uses it to calculate a cardinality value and then passes it on to the CBO. For Oracle to be able to do this, however, we require a third and final component; that is, the association between the pipelined function and the interface type. We do this as follows.

With this command, our pipelined function and interface type are now directly linked. Incidentally, our type could also be associated with other functions (assuming they also had a single parameter named P_NUM_ROWS).

testing the extensible optimiser

We have now completed our setup for the Extensible Optimiser. To test it, we will repeat our sample query but without any hints, as follows.

As we can see, the CBO has picked up the cardinality value that we passed to our pipelined function and used it to optimise our SQL. If we trace the optimisation of our query with a 10053 event, we can see further evidence that our interface type is being used by the CBO, as follows.

This trace file is from an 11.1.0.7 instance. We can see that the optimiser executes the ODCIStatsTableFunction method in our interface type and receives the correct cardinality in return. Note that in 10g (and possibly 11.1.0.6), the 10053 trace file includes the full PL/SQL block that the optimiser uses to execute our ODCIStatsTableFunction method.

benefits of the extensible optimiser method

Overall, we can see that the Extensible Optimiser is a useful, accurate and, importantly, a supported method for supplying cardinality to the CBO. Its main benefit over the DYNAMIC_SAMPLING hint (the only other supported method at the time of writing) is that it doesn't execute the pipelined function itself, just the ODCIStatsTableFunction method.

In our example, we have simply exploited the Extensible Optimiser feature to create our own alternative to the CARDINALITY hint. Apart from being a supported method, another benefit over the CARDINALITY hint is that initial value of P_NUM_ROWS could be passed as a variable rather than hard-coded as above (it must be known in advance when used in the CARDINALITY hint).

an alternative implementation

For an alternative implementation of our Extensible Optimiser method, we could remove the P_NUM_ROWS parameter and instead use a lookup table to store representative cardinalities for all of our table or pipelined functions. A single interface type could be associated to all functions, with the ODCIStatsTableFunction method looking up the cardinality based on the executing function name (which is also known to the interface type). With this technique, we could avoid hard-coding cardinalities and modify them over time in the lookup table as needed.

the extensible optimiser, table functions and variable in-lists

So far, our examples have all been based on the EMPLOYEES_PIPED pipelined function. We will complete this article with an example of a table function. Table functions are commonly used as a mechanism to bind variable in-lists passed as collections into SQL queries (for an example, see this article). Using the Extensible Optimiser, we can devise a generic way to determine the cardinalities of all table functions used in variable in-list queries.

First, we will create a simple collection type to support any variable in-list of string values.

Second, we will create a small function that will receive and return a collection of our generic VARCHAR2_NTT type. This function does nothing with the collection itself; it is merely a wrapper over it.

This is very similar to our previous example so doesn't need to be explained in any great detail. Note, however, that our function has a P_COLLECTION parameter, which needs to be replicated in our ODCIStatsTableFunction method signature. We can now add our interface type body, as follows.

Our implementation is very similar to our previous example. This time, however, we have a collection parameter rather than a scalar number, so to supply the CBO with the correct cardinality, we simply count the collection's elements (line 20).

Fourth and finally, we must associate the function with the interface type, as follows.

Before we test the Extensible Optimiser with a variable in-list query, we'll see how it works with a simple table function select with Autotrace. First, we will query a hard-coded collection of three elements without our COLLECTION_WRAPPER function, as follows.

Unsurprisingly, the CBO has to resort to the default cardinality of 8,168 rows. To counter this, we can wrap our collection in a call to the wrapper function and enable the CBO to get the correct cardinality, as follows.

This time the optimiser has the correct cardinality due to our interface type. The generic COLLECTION_WRAPPER function and COLLECTION_WRAPPER_OT interface type combine to provide statistics to the CBO when using small collections such as this.

As stated earlier, table functions are typically used to support variable in-lists, so we will see an example of how this wrapper method can assist in this scenario. We will filter the EMPLOYEES table by a variable in-list of names. The in-list is represented by a collection and although it is hard-coded for simplicity below, we would usually expect it to be passed as a parameter/bind variable.

First, we will execute the query without the COLLECTION_WRAPPER function, as follows.

Despite the fact that we only want to query two names from the EMPLOYEES table, Oracle has chosen a hash semi-join. This is because the optimiser has used the 8,168 heuristic cardinality for our collection. To see the effect of the Extensible Optimiser in this case, we will repeat the query but with the COLLECTION_WRAPPER function, as follows.

This time, the optimiser is aware that there are only two elements in the variable in-list collection and has opted for a nested loops join accordingly. This also demonstrates that the COLLECTION_WRAPPER and COLLECTION_WRAPPER_OT objects can be re-used for all queries that include small collections (as in-lists or otherwise).

summary

In this article, we have seen four methods for supplying table and pipelined function cardinalities to the optimiser. Two of these methods are unsupported (as of 11g Release 1) and for this reason, their use in production code is discouraged. Of the two supported methods, the DYNAMIC_SAMPLING hint is a new feature of 11.1.0.7 and has some limitations and performance implications. The Extensible Optimiser feature is the most flexible method to use at this stage and is usable in all versions of 10g. Using this, we have devised a good alternative to the CARDINALITY hint for pipelined functions and also created a generic wrapper for small collections that are typically used in variable in-list queries.