You've already got the answer to the question about the 30009 - recursive descent just takes memory.

The Gaz in Oz approach is the most common - and very efficient - workaround, but a couple of extra comments on your code:

a) always start with a dbms_random.seed(0) - or some other seed point - so that you can reproduce the same random value on repeat tests.

b) I tend to use exponent notation to avoid micounting on big numbers as in: 1e8 for 100,000,000

c) Try to be a bit clever about use of dbms_random - it's CPU intensive:

e.g.

create table t1

nologging

as

with generator as (

select

rownum id

from dual

connect by

level <= 1e4

)

select

-- dbms_random.string('U',20) padding

rpad(dbms_random.string('U',6),20) padding

from

generator v1,

generator v2

where

rownum <= 1e6

;

Note the rpad() of a 6 character random string - this gets me 20 character strings which are guaranteed nearly unique, but does a lot less work than generating all characters randomly. On my 1M rows table the padded version took 41 seconds to run, the pure dbms_random version took 115 seconds. That matters a lot when you're try to build 1e8 rows. (When you've decided your data patterns, test the time to generate a small data set before you create the big one).

I tend to stick with 1e4 rows for my WITH subquery - but that's good for 1e8 rows in the result set, and I don't often worry about adding in a 3rd generator unless I want to use v1.id and v2.id (and potentially v3.id) to produce particular patterns - which I happen to have one in the following posting: https://www.red-gate.com/simple-talk/sql/oracle/massive-deletes-part-2/

> Try to be a bit clever about use of dbms_random - it's CPU intensive:

Agreed and dbms_random.string appears increasingly inefficient for longer strings (which is hardly surprising given that it is essentially a PL/SQL loop).

I personally find that if you are not too precious about actual string content then ora_hash/standard_hash/sys_guid can be combined to very efficiently generate quite a range of things without needing PL/SQL function calls.

Also, whilst it doesn't seem to affect the plan in this case, I always find it useful to use the cardinality hint on a row generator so the optimizer has a bit more information on the number of rows it's going to process. This is particularly important with dual, because the connect by cardinality is calculated based on dual, which is of course, one row.

This is more applicable though when you're then joining that with other tables. For example, you might generate a row for every day of the year over the next 10 years, and join that to another table - without cardinality hinting the CBO may opt for a nested loops approach because it calculates only one row will be returned from the "generator", when in reality it's more like 3.6k. I've seen problems compound like this to generate loops that get executed several billion times!

How long does it spend about creating the 1e8 rows of data on table t1 on your system?

On the other hand, I repeat reading the content Jonathan was in response to me last time. He pointed out that it's best to use rpad function for filling the generated random strings, avoiding more consumption of CPU. Therefore I modify my code as well.

It depends on what you want for your data I guess. For example, at the moment you're padding a random string out to 20 characters - is there any reason why you can't pad that out to say 4000 characters? Then you don't need as many rows to reach your desired end table size. i.e.

As you can see, the table is significantly larger simply because I've padded it with a load of space. So rather than generating 100M rows, you might be able to get away with less than 1M given that your only worry seems to be the final size of the end table? Similarly, you can also add dummy columns in there that are just padded space if you're not too bothered about the content.

There's not a lot you can do if you want to generate 100M rows with random strings.

Most of the time will be spent in calls to generating random bytes - which is clear from the time difference between 6 character and 20 character strings. Some CPU time will be spent, of course, doing pure string manipulation, and the date to string formatting that you've shown in the earlier postings.

If you've got the resources you can parallelise the code, but not while the main (1e8) select is using ROWNUM in any way; and you may have to add a padding column to the generating "with subquery" to make sure that the parallel query slaves that scan it don't end up with just one of two of them getting data and the resting doing nothing.

I am not sure why people generate "random" <whatever> for testing to begin with. The values are not random anyway, they are 100% deterministic. There are trivial ways to generate many values - even many distinct values - with much simpler algorithms than those used by random number generators.

For example - being lazy here and using ROWNUM, which is not really needed - why not generate the 20-character strings like so? Assume ROWNUM up to 1e8 - 1 to avoid TO_CHAR overflow.

RPAD(TO_CHAR(ROWNUM, 'FM00000000'), 20, 'A')

These values are guaranteed to be distinct. They can also be easily obfuscated - select a very large prime number P (for example a nine-digit prime number, find it somewhere online), and instead of ROWNUM use MOD(P * ROWNUM + OFFSET, 1e8) for some arbitrary nine-digit OFFSET. If strings of upper-case letters are required for whatever reason, and if the distribution of the first six (or even of ALL) letters must be uniform and independent, all of these can be arranged with trivial arithmetic manipulations, much simpler and less expensive than DBMS_RANDOM functions. Isn't that sufficient?