Thought I might mention a common trap associated with Function-Based Indexes following a recent question on the OTN Database Forum.

First of all, considered this “simple” little scenario. You have a column in a table that’s a standard VARCHAR2 field and it has 100 distinct values. Therefore, if you select just one value, assuming even distribution of data, you’ll select approximately 1% of the data.

However, this table can potentially allow mixed case fields, with values such as ‘Bowie’, ‘bowie’, ‘BowiE’, etc.

Therefore, you decide to write a query that first converts all the fields to UPPER case such that you can now compare all possible values of a specific name to say ‘BOWIE’.

Remembering you previously had a selectivity of 1%, what’s the selectively of the column after you’ve converted everything to upper case ?

Ummmm, actually, there’s no easy answer to that. It all depends on how many actual different names you have in your data. In theory, you may now have just 1 distinct value or you may have 100 distinct values, you can’t really tell.

However, what if the function did something more complex and converted the name to some numerical value based on the sum of all the characters via some weird formula. In theory, you could have no distinct values in that everything could get converted to a NULL or you could possibly have as many distinct values as there are rows in the table.

Again, Oracle can’t really tell how many rows may get selected based on the determination of such a function.

When a predicate is used with a function call, Oracle has great difficulty in determining the correct selectivity. Therefore when you create a Function-Based Index, which Oracle assumes you may wish to access, Oracle behind the scenes creates a hidden virtual column on the parent table in order to capture the data characteristics of the function so that the CBO can make an accurate determination of the selectivity associated with using the function.

If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine the associated selectivity and cardinality when the function is used on the column (assuming even distribution of data) and hence calculate an accurate cost and determine whether the use of the function-based index is appropriate.

However, and here comes the trap, when a function-based index is created, Oracle will now (since 10g) automatically calculate the statistics associated with the index (such as the blevel, number of leaf blocks, clustering factor, etc.) but it will NOT calculate the statistics associated with the hidden virtual column as these statistics are associated with the parent table, not directly with the index itself.

Therefore, even after you’ve created the function-based index, Oracle still has no idea on the selectivity associated the function because the necessary virtual column statistics are still missing. Oracle simply takes a “guess” and as with most guesses, it’s quite likely to be wrong which means the selectivity is likely to be wrong which means the costings is likely to be wrong which means the execution plan could very well be wrong as well.

What should you do after you create a function-based index ? You should collect the statistics on the hidden virtual column so that the CBO can accurately determine the cardinality associated with using the function and so make correct decisions regarding the most appropriate execution plan.

The ‘FOR ALL HIDDEN COLUMNS’ method_opt option with DBMS_STATS is one way of collecting just the necessary table statistics on these hidden virtual columns:

My early Christmas wish to Oracle would be for the CBO to at least consider the DISTINCT_KEYS index statistic when determining the correctly selectivity when hidden virtual column statistics are missing on function-based indexes, rather than take it’s 0.4% / 1% guess …

“If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine….”
Well MORE accurately determine. There might be skew in there. Maybe 11gR2 will have an option to determine skew based on the Scrabble values of the letters involved.

Funny that you just blog over this topic when I was investigating statistics for FBI’s. Just in time. :-)

As a caveat for others: For FBI’s I found statistics like NUM_DISTINCT only via USER_TAB_COLS and not via USER_TAB_COL_STATISTICS. I believed there were no stats for FBI columns because I queried USER_TAB_COL_STATISTICS. Then used Richard’s statement to gather them and was left wondering why they did not appear. Only after I queried the other table they showed up.

Well, I could have known: documentation of USER_TAB_COLS has this in there “This view differs from “USER_TAB_COLUMNS” in that hidden columns are not filtered out.” – but no remark about the filtering is to be found in doc of USER_TAB_COLUMNS…

[…] read a very interesting blog post the other day by Richard Foote entitled “Function Based Indexes and Missing Statistics – No Suprises.“ The post explains how in Oracle 10g that normal statistics are generated for […]