Saturday, March 24, 2007

One of the more subtle changes between 9iR2 and 10g is that you can now actually gather information about the current SQL of a CTAS operation being executed by a session.

In 9iR2 the actual SQL was (often?) not available as information in V$SESSION (SQL_ADDRESS, PREV_SQL_ADDR etc. columns), you saw instead some recursive SQL executed by Oracle to maintain the data dictionary (e.g. adding a row to sys.seg$ or similar) while the statement was being executed which was not really helpful in case you wanted to monitor your sessions.

If you created the table in parallel then at least you could see the internally by Oracle generated SQL for each of the parallel slaves to create its allocated segment of the final table. This has changed with 10g, where all the participating processes (coordinator and parallel slaves) show the same original CTAS SQL while the statement is being executed.

What really is special about the CTAS is the fact that it is often used as part of a regular batch process which for instance transforms data. So you might find yourself in the situation where you would like to analyse why a particular run of your batch process took that long.

If you attempt to get more detailed information about a CTAS that recently took place by querying the shared pool dynamic views you'll notice that you won't find any information about the CTAS at all. I came across this issue while implementing and testing the interface to DBMS_XPLAN.DISPLAY_CURSOR in my open source database frontend tool SQLTools++. It seems that a CTAS is treated like regular DDL and therefore is discarded from the shared pool right after the execution completes.

While this makes perfectly sense for regular DDL which you probably don't want to waste memory in the shared pool, in case of a CTAS this is not really helpful as you don't have a chance to perform any further analysis once the statement has completed.

Of course, one potential workaround could be to split your operation into two separate steps, one CTAS operation to create an empty table (e.g. by adding the predicate "1 = 2" to the original SQL) and a second INSERT /*+ APPEND */ SELECT DML operation which does the actual work and should be equal in terms of performance and resource consumption, but this adds further complexity to your process that is basically unnecessary and poses the potential risk that the two operations don't behave the same as one single CTAS, e.g. the CTAS to create the empty table might take longer depending on how and when the predicate "1 = 2" is evaluated or the INSERT might use a different plan than the single CTAS.

Consider this sample script which illustrates the issue by using the new 10g feature DBMS_XPLAN.DISPLAY_CURSOR:

Another approach in that sense might be that you change your processes in such a way that the tables are all pre-created at the beginning and the actual work is then done as INSERTs, this is especially true if the table layout is already known and does not change frequently.

Either way you decide to go, the big advantage is that you now have a chance to gather information about the executed INSERTs afterwards by querying the corresponding shared pool dynamic views as long as the corresponding information has not been aged out of the shared pool.