oracle-developer.net

binding in-lists in 10g

This article demonstrates a new method for binding IN-lists in 10g, using the new collection condition MEMBER OF. Also included is a common pre-10g method and a performance comparison (with dramatic results which leads me to conclude that MEMBER OF is much simpler to use but should only be used with very small sets or lists).

Incidentally, how to bind IN-lists (i.e. how to pass a list of values in one parameter and have Oracle treat the single parameter as a list) is a common problem cited on many web forums. Unfortunately, what many developers do is construct a dynamic SQL statement and concatenate a delimited string to the end to represent the list of values. In addition to shifting from static to dynamic SQL, this approach also makes it impossible to use bind variables and, for form-based applications, also leaves the application wide open to SQL injection. An alternative to this is to turn the delimited string into a collection first and reference the collection in the SQL.

In this article, we will use examples that pass collections of data as IN-lists, rather than lists of values composed as delimited strings.

setup

First we will create some simple collection types to use in our examples, as follows.

in-list binding in previous versions

The correct way to bind an in-list, regardless of Oracle version from 8.0 onwards, is to have the list-values passed as a collection. We can then cast the collection to a pseudo-table (using the TABLE operator). We will see an example of this below. First we will create a simple procedure to find all objects in a list of categories. The list of categories will be passed to our procedure as a collection.

a simple performance test

We will compare the new and old methods for IN-list binding under a couple of simple scenarios, as follows:

constraining a SQL statement as an IN-list; and

testing a literal value against an IN-list in PL/SQL.

test one: filtering a table based on an in-list

For our first performance test, we will compare methods for binding an IN-list to a SQL statement, using a version of Tom Kyte's RUNSTATS utility. We will create a table with 1 million rows and query it with an IN-list collection of approximately 100 keys. First, we create the sample table as follows.

We will now create a small procedure to compare the new and old methods. This procedure will load a collection with a parameterised sample of IDs to test the TABLE operator and MEMBER OF condition at different volumes.

We can see a dramatic difference in performance between the two methods, with the new MEMBER OF condition taking 16 times longer than when we queried the collection as a pseudo-table. The key to this lack of performance is evident in the statistics, particularly the "table scan rows gotten". This indicates that Oracle is using a full table scan with the MEMBER OF method. We can verify this below with EXPLAIN PLAN (although we will use a literal collection rather than a bind variable for convenience).

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'MEMBER' FOR
2 SELECT *
3 FROM million_rows
4 WHERE id MEMBER OF number_ntt(1,2,3);

As indicated by the RUNSTATS results, Oracle has chosen a full table scan for the query, with an estimated cardinality of 50,000 (a 20% selectivity) for the filter. We can compare this with the CBO's treatment of a normal IN-list, as follows.

Oracle is able to recognise the list of literal values in this case and make the correct cardinality estimates, leading to an index range scan. Of course, this article is about parameterised IN-lists, so if we compare the new MEMBER method with the original TABLE operator method, we can see that Oracle handles the collection in a very different way.

With the TABLE operator method, Oracle chooses a nested loops join using the unique index on MILLION_ROWS, hence the major performance improvement. This is due to the semantic difference between the two methods. With the TABLE operator, Oracle unnests the values from the collection and then uses these to probe the MILLION_ROWS_PK index. With the MEMBER condition, however, we only have a filter condition, not a join. We are asking for records from MILLION_ROWS where the ID is in the collection, rather than the other way round. This means that Oracle must probe the collection with values from the MILLION_ROWS.ID column. This has a dramatic effect on performance and there seems to be no way to reverse this behaviour.

There are a number of ways we can try to correct the cardinality or force Oracle to use an index, for example:

With any of these methods, Oracle will not rewrite our query in any way. There simply is no way around the fact that we are filtering MILLION_ROWS against the collection. The best we can achieve is an index full scan, forced by an appropriate hint, as follows.

It appears that an index range scan is simply not available as an access path when using this new condition, because Oracle will not reverse the direction of the filter.

The above performance tests were conducted using many probes of a small collection. It follows, therefore, that the larger the collection, the greater the disparity in performance between TABLE and MEMBER, with the latter becoming almost unusable. Some example timings are as follows.

Sample Size

IN-List Size

IN TABLE() (s)

MEMBER OF (s)

0.001

5

0.1

5.1

0.01

107

1.1

16.1

0.1

984

13.7

107.2

1.0

10,000

21.4

>1,200

Note that this behaviour was last tested on an 11g Release 1 database (11.1.0.6) and it is unchanged.

test two: probing an in-list with a single value

The previous tests measured the performance of a parameterised IN-list in SQL, which is the most common application for this technique. However, we can also use the MEMBER condition in PL/SQL tests. For our second performance test, we will repeatedly probe a parameterised list with a single value, again comparing the original TABLE operator with the new MEMBER condition. For completeness, we will also measure the performance of a loop through the parameterised list to search for the single value. We will use an anonymous block for these tests, as follows.

As there is no table access involved, we have used a simple TIMER package to measure relative performance. We can see that for this type of IN-list probing, the new MEMBER condition is more efficient and effective than any other method, so it might be worth using this technique under certain circumstances.

further reading

For a good discussion of techniques for binding IN-lists prior to 10g, particularly when they are supplied as delimited strings instead of collections, see this article by William Robertson. For more information on new collection conditions and operators in 10g, see this oracle-developer.net article. The TIMER and RUNSTATS packages used in the performance tests for this article can be found on the Utilities page of this site.

source code

The source code for the examples in this article can be downloaded from here.