Generating test data that matches existing data

I’ve had to create test data a number of times, including data for tables that had mandatory foreign keys to existing tables. It was not feasible to just create new master rows for my test data; I wanted to refer to a random sample of existing data; but the code that generates the test data had to perform reasonably well, even though it had to pick out some random values from a very large table.

Solution? A combination of the new 10g SAMPLE operator, and DBMS_RANDOM. To illustrate:

My test code would open the cursor, fetch as many values as it needed, and then close it. If the cursor ran out of values (e.g. the sample was too small for the desired amount of test data, which varied), my code just re-opened the cursor to fetch another set of random values from the large table.

The reason I sort the sample by dbms_random.value is so that if I only want one value, it is not weighted towards rows found nearer the start of the table.

Note: If I didn’t really care about the sample being picked at random from throughout the table, I could have just selected from the table “where rownum < n".