“For a BTree index, NULL values are not included in the BTree structure and, thus, not even accessible through the index.”

“Expert Indexing in Oracle Database 11g” page 159 (any problems here – I think that I have a test case somewhere that suggests that the second column should be a number, I thought about buying this book):

“If all index columns are NULL, Oracle Database doesn’t include rows into an index. However, you can actually index NULL values by simply adding another column to the index, like so:

SQL> create index with_null on employees(nullable_column, '1');

—

So, what is my review comment for the Cookbook?

The book states, “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.” The book’s description is incomplete. NULL values are not stored in single column b*tree indexes. There are at least four methods to work around this issue and allow indexes to be used to identify rows with a NULL value in the indexed column:

Define a composite index with at least one other column that has a NOT NULL constraint – ideally, the column in which the NULL values might appear would be the leading column in the composite index.

Define a composite index with a numeric constant (such as 1) as the second column in the composite index.

If the number of NULL values in a column will be relatively small (compared to the number of rows in the table), and the original SQL statement may be modified, create a function based index that converts NULL values to 1 and non-NULL values to NULL:

DECODE(C3,NULL,1)

(CASE WHEN C3 IS NULL THEN 1 ELSE NULL END)

(NVL2(C3,NULL,1))

Let’s build a little test case to demonstrate. First, a table is created with 1,000,000 rows, two indexes are created, and then statistics are gathered with histograms generated for all indexed columns:

Well, it appears that the composite index on columns C3 and C2 might have helped quickly locate the rows with NULL values in column C3 (we just tested point #1 above). Let’s drop that index and try again:

Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”. The Predicate Information section of the execution plan shows access(“C3” IS NULL) (we just tested point #3 above).

Let’s drop the index and try just indexing the NULLs (sub-point 1 of point #4 above):

The index that we created was used, although notice that the Predicate Information section of the plan is a bit different from before, and this time we have an INDEX RANGE SCAN operation rather than an INDEX FULL SCAN operation. Once again, we found that we cannot simply state that “NULL Values are Not Stored in Indexes”.

Let’s drop the index and try just indexing the NULLs (sub-point 3 of point #4 above):

The index that we created was… not used? But is it not the case that the NVL2(C3,NULL,1) function result is the equivalent of the DECODE and the CASE function results? Let’s give this one another try with a hint:

That’s better, even if we did receive an INDEX FULL SCAN operation rather than in INDEX RANGE SCAN operation as was the intended result.

Something fun to think about – why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?

18 responses

I think that Yasser may have already answered your question, but then I may be misunderstanding your question.

In short, indexes may be used to locate rows that contain NULL values in a table column. NULLs may be stored in indexes. Whether or not a specific index should be created or adapted specifically to locate NULL values will depend on the importance of query performance, compared to the use of additional disk space for the index and index maintenance delays during inserts, updates, and deletes of the column data as additional redo and undo is created and the index structure grows.

Method 1 uses a standard composite b*tree index. Method 2 creates a function based composite index with the second column containing a numeric constant (I think that the Enterprise Edition of Oracle Database 8i was the first version that supported function based indexes, and that feature was extended to the Standard Edition in 9i). Method 3 creates a bitmap index – that requires the Enterprise Edition. Method 4 creates a function based index that indexes only cases where a column value is null.

I think up-front statement “NULL values are not stored in indexes” in renowned Oracle book, with author having idea behind the scenes to avoid complexity can sometimes cause reader to feel uncomfortable, but i think we should be thankful to author for publishing the scenarios and making our brains to think in different views and avoid “Compulsive Tuning Disorders”

To me this statement “NULL values are not stored in indexes, so when we query for records with a NULL value in field X, even if the X column is indexed, the index will not be used.” seems to be WRONG and incompletely framed ‘sentence’.

Once again, nice demonstration and good question.why did the technique using the CASE syntax that resulted in an INDEX RANGE SCAN operation have a plan with a calculated cost of 10,024, when the plans with the INDEX FULL SCAN operations have a calculated cost of about 3,750?
I must admit I don’t know the answer but my best guess is the costing has something to do with the way index is being used (access vs. filter). In case of INDEX RANGE SCAN, while the index step has ACCESS predicate (and a low cost of 20) whereas in case of INDEX FULL SCAN, the index step has FILTER predicate (and significantly higher cost of 3234). Till this date, am not quite clear about the difference in ACCESS and FILTER predicates.

I am not sure that I have a nice, short description of access and filter predicates. You might think of it this way, if you think about the index in the front of a book:Access Predicate: You know that the topic of interest is found in chapter 8, under a heading of “Examining Execution Plans and Plan Statistics” which is in a section of the book from pages 219 through 226 (or 227). You then flip to pages 219 through 226 and start reading.Filter Predicate: As you are reading through the section, you are discarding sentences that do not contain the exact information that is of interest, and remembering the information that is of interest. (If the index were more detailed, but the specific item of interest was not detailed – you did not know to look specifically under the heading “Examining Execution Plans and Plan Statistics”, then part of this filtering might have taken place while examining the index).

This is the description that is provided on page 222 of the book “Expert Oracle Practices”:

The Predicate Information section of the execution plan shows the conditions applied to indexes and joins during access (the access entries), as well as the restrictions applied to the table data to filter out unwanted data as specified by the conditions in the WHERE clause (the filter entries) and automatically generated predicates added by the optimizer. Filter predicates may appear for plan lines showing index accesses when the leading column(s) of a composite index are not specified in the WHERE clause (resulting in an index skip scan operation in the plan); when intermediate columns of an index are omitted from the WHERE clause, for instance, specifying access criteria for columns 1, 2, and 4 of a composite index, but not column 3, which prevents column 4 from being used as restriction criteria during the access operation; and when a range comparison (less than, greater than, between, and so forth) is used on a composite index’s column, those restrictions placed on columns in the index definition after the column with the range operation may not be examined during the access operation. The filter predicate entries are also able to reveal potentially time-consuming implicit datatype conversions in joining and filtering predicates.

ACCESS_PREDICATES: Predicates used to locate rows in an access structure. For example, start or stop predicates for an index range scan.FILTER_PREDICATES: Predicates used to filter rows before producing them.

With all due respect, I don’t think your analogy explains my issue and neither does the description in the links that you have provided. Your analogy is in-line with what happens in the case of INDEX RANGE SCAN example above. Essentially, I understand that, typically, an ACCESS PREDICATE is applied to the index access step in the plan (as you have described with book index analogy) and the FILTER PREDICATE is applied to the table access step in the plan (as you described in “As you are reading through the section…”).
However, I had pointed out the difference in the way predicates are reported for index access only. In your examples, with DECODE index the plan reports the predicate for index access step as filter predicate (with much higher cost) as compared to the access predicate (with much lower cost) reported when a CASE based index is used.
So far, the only resource that comes close to this is http://jonathanlewis.wordpress.com/2010/08/31/filter-bug/
Would love to know if there are any better or more detailed explanations available.
BTW, apologies for diverting the discussions away from your main question (which, as I said, I don’t know the answer for)

I might not be answering your question yet. However, did you notice that the CASE example that showed an INDEX RANGE SCAN had the following in the Predicate Information section access(“T2”.”SYS_NC00005$”=1) – SYS_NC00005$ is a hidden/virtual column that was automatically generated when the function based index was built:

So, why did we not see the same virtual columns with the NVL2(C3,NULL,1) function. I could say that this test result was intentional, or I could say that I accidentally omitted the () around the NVL function when I created the index. Take a look at the output of the following script that first incorrectly formats the index specification, and then correctly formats the index specification:

I guess the answer lies in USER_IND_EXPRESSIONS (or ALL). I remember quite some time back reading about why a function-based index was not being used on AskTom site and Tom pointed out that the function-based index can be used provided the predicate matches the expression in those views for the index. Not sure what version was that and can not find the link to it. In that thread, Tom had explained how the ALL_IND_EXPRESSIONS store the expression in different way that the one specified while creating the index (in some particular case). Not sure it is still valid.

Thank you for posting a link to your article – always good to see links to articles that demonstrate helpful information. You certainly did capture all of the options.

The first time I read the article, my eyes became a bit blurry… why would Oracle’s optimizer select to use an INDEX FULL SCAN operation, which reads one block at a time (from disk), when it can just about as quickly read 8, 16, or 128 8KB blocks at a time (from disk) during a FULL TABLE SCAN operation. Then it hit me on the second read through… your test table only had 10 very narrow rows – the indexes probably contained only an index root block.

I extended your test case to 1,000,000 rows, with each row being much wider (column C_2 is padded to 255 characters) (tested on 11.2.0.2, I am explaining the steps for the other people who may read this comment, not necessarily for David):

No full table scan, but we did receive an INDEX FAST FULL SCAN operation, which performs multi-block reads (from disk) of the index blocks, much like what would happen with the table blocks during a FULL TABLE SCAN operation.

Let’s see what happens if we try to force an index access path using a hint for the same query:

The calculated cost is unchanged after the change of the MBRC system statistic.

—

David,

Would you be able to re-run your test with Oracle Database 11.2.0.1, 11.2.0.2, or 11.2.0.3 with a larger number of rows? I am curious to know if you will see INDEX FAST FULL SCAN operations, INDEX FULL SCAN operations, or TABLE ACCESS FULL operations in the execution plans.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: