Interesting case where a full table scan is chosen instead of an index

While I was presenting at Hotsos Symposium 2013, I showed a real-life case where the CBO was choosing a full table scan instead of an index range scan. The root cause was a suspicious “NVL(column, value) = value” predicate seen in the execution plan. But the actual SQL contained only “column = value”. So the question became “WHERE the NVL is coming from”? The selection of index or full scan became a simple side effect of the “function(column) = value” predicate.

On 11g there is a new optimization to improve the performance of DDL operations adding columns to an exiting table with data, where the new column is declared as having a constant as default value while making it not nullable. In other words, all existing rows are logically expected to be updated with default value on this new column. If the table is large, this ALTER TABLE ADD COLUMN operation would take long. This optimization records instead just the metadata for this new column (default value and not null constraint) and returns in no time regardless the size of the table.

What was confusing in this real-life case was the fact that filter predicate looked like this “NVL(USER2.SOME_FLAG,’N’)=’Y'” while SOME_FLAG default value was “Y” instead of “N” according to DBA_TAB_COLS. How was that possible? At some point we suspected the column default value had changed from “N” to “Y” but it was a simple speculation. So HOW do we prove or disprove the unexpected “N” in the filter predicate was the product of changing the default value of the affected column?

I created a small test case in order to reproduce observations from the real-life case. I created a table with data on my 11g test database. I created afterwards a new column which would use this new optimization (having a default constant and making it not nullable during same DDL). Last, I modified the default value for same column with another DDL. Keep in mind that when column was created only the metadata was recorded and the actual rows where not updated (this is the optimization).

I was expecting different result and I thought this was a bug. I presented the issue to Mauro Pagano for a peer review and he explained that results were expected (and correct). If we hadn’t had the DDL optimization, at the time of the first ALTER TABLE we would had updated all rows with value “NO”. Then at second DDL we would had updated no rows with value “YES”, but future rows may get that value “YES” if c2 was not explicitly assigned a value. It made sense. And the optimization mimics the same behavior. So, rows that existed before the first DDL have no actual value for “c2” but stored metadata will assign “NO” when accessed. Rows created between DDL 1 and 2 would acquire default value of “NO” if needed (and updated with that “NO” value). And rows created after 2nd DDL would get value “YES” if needed (and updated with that “YES” value). So, only rows before DDL 1 would still have an internal NULL but external “NO”, thanks to initial stored metadata from DDL optimization.

Conclusion

The use of DDL optimization improved the performance of the ALTER TABLE ADD COLUMN command but it introduced a filter predicate which disabled the possible use of a normal index on same column, affecting the performance of queries having such filter predicates. Possible solutions include:

I see your point. Yes, it looks like validation of data types on expression used to create virtual column happens when the data from it is materialized into the index. I also get “ORA-01722: invalid number” when I create such virtual column then try a simple SELECT * FROM t1.

The real case shows a frequency histogram for this column. This histogram shows only one bucket and it is for value ‘N’. Column stats show this column has two distinct values with low/high of ‘N’ and ‘Y’. Due to fix for bug 10174050 selectivity for value ‘Y’ (which is missing on histogram) is computed as 0.5 of smallest bucket (with value ‘N’ and selectivity of 1.0). With such a large selectivity CBO goes with FTS and we see NVL predicate. If we had had a more accurate selectivity (close to zero), we would had used the index on this column and the predicate from SQL would not had had the NVL on it. Original case had an index on this column. Original table had 3M rows and cardinality was computed as 1.5M. There were only 2 rows with value ‘Y’. DBMS_STATS auto sample size missed value ‘Y’ on histogram (bug 10174050).

So, the suboptimal plan was due to corner case where missing value from histogram made CBO over estimate cardinality and decide on a FTS. Presence of NVL on predicate was then a side effect of CBO selecting a FTS and not the other way around.

I think that Joaquim’s suggestion is correct. That is, this DDL optimization is not affecting the use of an index created on the column C2. When the CBO sees that the cost of using the index on C2 is better that FULL SCANNING t1 table it will use that index even thought a NVL filter is applied on the indexed C2 column. And vice versa when the cost of FULL SCANNING the table is less than the cost of using an index range scan.