oracle-developer.net

row-generation without i/o

This article demonstrates various methods for generating dummy records for use in SQL queries. Data generation has many uses in SQL (for example, to fill sparse data, to pivot data or to generate test data). There are several techniques for generating dummy data, but the particular focus of this article is on I/O-free (or I/O-reduced) generation.

We shall be looking at the following methods:

DUAL..CONNECT BY;

pipelined functions; and

MODEL.

As we are only interested in "low-cost" methods, we will not be looking at queries against ALL_OBJECTS (or similar) to generate data. Note also that this article summarises popular methods and, where applicable, the sources are credited.

a note on method

To compare the various methods, we will generate 10,000 dummy rows from each SQL statement. All examples are executed in sqlplus with autotrace (traceonly statistics) on a 10.1 database. They are executed twice to compare initial resource costs with the subsequent I/O-free running costs. The "wall-clock" timings are derived from the TIMER package available on this site.

dual..connect by

Several methods are available that use DUAL as the source of data generation. The most efficient of these is to run a constrained CONNECT BY ROWNUM query, as follows.

Interestingly, this method is "get-free" from the very first execution. We can see from the autotrace output that there are no physical or logical I/Os with this method. There is a caveat to this, however. This example has been run on a 10g database, which means it is subject to FAST DUAL optimisation. The second time we run this example, we see the recursive calls drop as there is no hard parse and the time reduces slightly as a result.

To emulate 9i behaviour, we can re-run the example but ensure that Oracle accesses DUAL itself (we can do this by selecting the DUMMY column, rather than an expression. This time, a small I/O cost is incurred.

Regardless of how many times we run this SQL, or even the number of records we generate, it costs us three logical I/Os (which has little to no impact on overall runtimes). On 9i (test database version 9.2.0.6) the cost is 7 logical I/Os and 5 memory sorts.

pipelined functions

Ever since Oracle 8.0, it has been possible to select from a function that returns a collection. This is known as a "table function". As a means to manipulate and return large datasets, however, table functions are non-scalable and PGA memory-intensive. In 9.0.1, Oracle improved on this dramatically with pipelined functions. Like table functions, pipelined functions also return collections of data, but in streams. Small "packets" of data are handed back to the consumer as they are produced, meaning that the PGA footprint is small and generally non-invasive.

Given this, we can create a simple pipelined function to generate n rows of dummy data. First, we require a collection type as follows.

The first execution of this SQL has higher resource costs than the DUAL method, but is significantly quicker. If we run it a second time, we can see the I/O and recursive SQL costs reduce to nothing. This in turn has a slight benefit for our runtime.

model

The final technique to look at is the MODEL clause. This is a new feature of 10g and gives the SQL language spreadsheet-like capabilities. It is a difficult technique to learn and most developers (me included) look at it with some trepidation! Generating dummy data is not exactly stretching MODEL's capabilities, but is relatively simple as follows.

We can see that MODEL generates no I/O, even on the first execution. In the example SQL, we are not accessing any segment data, hence this is a CPU-only exercise. It is, however, slower than the other techniques discussed above. On the second execution we see the following results.

further reading

acknowledgements

Credit for a variation of the MODEL example goes to an Ask Tom reader named Ant, as detailed in the Nov/Dec 2004 Oracle Magazine. I can't remember where I first saw the DUAL..CONNECT BY technique, but it is commonly used in online articles and forum postings. It no doubt has some lineage back to Tom Kyte!

source code

The source code for the examples in this article can be downloaded from here.