Indexing NULLs: (Empty Spaces) January 23, 2008

There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.

Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.

However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.

The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.

If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.

So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.

But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?

Well not quite.

Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.

Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.

However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.

In case anyone is unsure what Christian is suggesting, don’t use null values for (say) binary_float numerics but use a Not A Number (NAN) instead. It’s not a null, is hence indexable and can be used to retrieve a number when it’s not really a number.

SQL> create table nan (id binary_float);

Table created.

SQL> insert into nan values (binary_float_nan);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from nan where id is nan;

ID
———-
Nan

Create an index and all NaNs can be potenitally retrieved via the index.

So a function based index is no longer my only option. The beauty of your solution being that user queries don’t need amending. The flip side in favour of a function based index (on say DECODE(processed,null,’N’,null)) could, potentially, be the reduced number of rows in the index.

Neil and Robert, why would you use DECODE or CASE instead of NVL, which works in exactly this situation?

Also, Robert “you would expect Oracle to be smart enough to know that a constant value does not make sense in an index and silently remove it. :-)” perhaps that is because it treats it as a COLUMN and not just as a character. Therefore, to ignore the value is to ignore the pseudo-COLUMN, and that cannot be done.

The beauty of Richard’s solution, is that the queries do not need to have any knowledge of the INDEX (such as is the case with an FBI). Currently, where i am employed, we use NVL([column], ‘~’) which look particularly ugly and is not intuitive. I am planning to suggest this change in a few minutes. :)

But how much space is wasted anyway? It’s two bytes per record. It doesn’t seem like much space, unless every single INDEX will start using this trick. Is that really an issue when there is such a benefit?

Brian, the disadvantage of NVL is that you will get a larger index because it will contain entries for every row. If you are really only interested in rows where col_X is NULL, your table is large and the portion of rows with NULL value is small then you want to create an FBI based on a function that will return a non NULL value for rows with NULL column value and NULL for all others. That can not be done with NVL as far as I can see. (Personally I prefer DECODE because IIRC that is standard SQL.)

Brian
Quote – “The beauty of Richard’s solution, is that the queries do not need to have any knowledge of the INDEX”
I agree completely, but also agree with Robert that the FBI solution has it’s place. If you have no control over the code then the subject of this blog entry is a godsend. Anyway, Robert has done a good job above explaining the FBI reasoning. Two good solutions to keep in the locker for those rare yet troublesome queries.

This is an excellent suggestion Richard and some good stuff in the comments too.
Bitmap indexes index null of course, though bitmaps should only be used where suitable.

For many years I’ve used two techniques.

If I want to index the normal contents as well as nulls, I use a false value to replace null (-1 in numbers, ‘Z’ or ‘#’ for varchar2’s), which always makes me cringe internally as it can lead to confusion, go wrong if you pick a value that later becomes valid, and needs documenting so that people realise -1 means null.
If I want to identify only the nulls then I added a new column as a null indicator and set it only for nulls. It could be argued that the indicator column is the “correct” thing to do from a logical modelling perspective.
Both the above suffer terribly from being usable only if you have access to the code base so you can change that, or even that you plan them in from the start (especially the null indicator column).

I think I will use both the above two techniques a lot less now and replace with the suggestions in this thread

Hi all, thanks for the comments, they’re great and really adds to the whole piece.

Regarding the use of a FBI, I totally agree that it can be very useful and was in fact the “solution” to #6 in my things you may not know about indexes “It’s possible and potentially very useful to just index some column values and not all column values within a table”.

It’s not only useful in this scenario but in any similar example where you may have a whole bunch of common values that you would not ordinarily search on and if you did would ignore the index anyways but have a few less common values, which through the use of say a decode function, would only return a value and index it if it’s of interest via an index access.

But but but, if also requires the application to have knowledge of the function-based index (virtual column) and be coded accordingly. The approach in the blog requires no rewrite of the application and has a relatively small overhead, assuming you may also want to query some of the not null values as well.

Yes bitmap indexes index null values but but but, they’re just not suitable for OLTP environments with any real concurrency requirements, even after all the improvements in later releases. I’ll cover bitmap indexes for sure in the future.

I’ve been aware of this behaviour for some time. I had a discussion with a colleague where I pointed out that NULLs can indeed be indexed. On production of a test case, his comment was that the behaviour must have changed, as he remembers that NULLs could not be indexed in v6.

I’m curious, has this behaviour changed, or was my colleague being a little disingenuous?

I’m not sure which but I would say your colleague has a good memory if correct ;)

A lot lot lot has changed since Oracle 6 days, I hope he doesn’t do too many things based on how Oracle 6 may have worked !!

Oracle 6 was before my time with Oracle so I can’t say with certainty however I would be surprised if it has changed because it would mean in Oracle 6 you could not index columns in a concatenated index if it contained a nullable column.

Because let’s pretend for a minute you couldn’t index null values. Let’s say you have an index on (A,B) and say A was not null but B could have nulls.

If you had value of A = ‘Bowie’ and B=null, then this row wouldn’t be indexed in Oracle 6. Either it would give you an error (in which case B would in effect be a not null column as well if indexed) or it would simply not index the row as B has a null.

But if you don’t index the row, then the index wouldn’t contain all the possible values for A. This “Bowie” row for example wouldn’t be in the index.

That being the case, how could we use the index to retrieve all A=”Bowie”, we couldn’t as Oracle can’t guarantee all the associated values are in the index.

So the index would be useless.

So unless Oracle 6 allowed you to create useless indexes or it didn’t permit you to create a concatenated index on nullable columns, I don’t see how it would have worked ?

I get the point and agree with your reasoning. My view is that nothing “should” have changed in the way an index works when processing nulls ( kind of seems inherent to the way that indexes work that they need to work this way when processing nulls ).

I always argue that things even change significantly between dot releases, so some behaviours cannot be relied upon even with releases of the same Oracle base version. ( a good example being the behaviour of STATISTICS_LEVEL for the various settings through 9i and 10G releases ).

@Jonathan – posted the provious quetsion without thinking. Your output if dbms_xplan, and the note at the end of the output implies 10G.

I was answering some questions about a problematic RBO -> CBO migration recently ( RBO queries running faster than CBO ) , and hadn’t considered jumping between optimizers on the10GR2 platform.

Incidentally, the solution that the team came up with involved, system stats and setting *both* optimizer_index_caching and cost_adj parameters. Needing to set both of these was a surprise to me. I haven’t managed to get my hands on the system to form a view as to whether there is another approach that would provide the same result. Raised some interesting questions for me though.

Mathew,
I wouldn’t normally consider setting 10g to RBO – but in this case I didn’t have a 6.0 around, so I just did it to demonstrate the point.

I’d look closely at what’s going on with system statistics (CPU costing) and the value of db_file_multiblock_read_count before setting the optimizer_index_xxx parameters. (My default position is to eliminate all three parameters on upgrade).

I understood the reasoning for running 10G in RBO. I just haven’t seen this done before, though understood the RBO code still executes despite being unsupported.

My expectation when moving RB0(9i)->CBO(10GR2) was to be able to use systems statistics, the default 10G DBFMBRC ( ie: unset ) and the default optimizer_index_* parameters ( ie: unset). I think there is something screwy going on with the stats generation in the area of histograms ( they use SKEWONLY ), and possibly with their workload stats. I haven’t been able to get hands-on to get to the bottom of things, and as it stands I’m told that subject to some final testing they now have satisfactory performance. So it seems unlikely that I’ll be able to take this further.

It amazes me sometimes how other people are working on the exact same set of issues as I am.

We have been suffering under the misunderstanding that nulls “don’t work” in indexes, until this weekend. We had an index that was being used to resolve a query that had a nullable column way down on the end somewhere that was in the criteria (as IS NULL) . We created a new index from several columns being used to select data, but not the nullable column and forced Oracle (9.2) to use the new index.

The query slowed down badly, though the non-nullable columns in the new index should have allowed many fewer index records to be scanned. It was then that my brain turned on and I realized that the nullable column was only null for about 0.2% of the population. So, though we were reading more index rows, we were reading fewer data rows per index row. And that made a huge difference.

So, when scanning indexes, Oracle obviously uses null values. The question we’re now trying to test to gain the answer for is: can Oracle use NULL to locate a position in the index like “any other value” or does it have to read all the leaf nodes and pick out the nulls?

As a general point, whenever one does something a little “differently” or unusal or uses a feature in an unusual manner, the risk is that you force Oracle down an execution path less trodden, less tested and hence more likely to find these nasty bugs or issues.

when playing with the constant-in-index option i recognized that under the cover it obviously (i.e. according to the *_index-view) creates an FBI as well . That’s probably the reason, why RBO, though index-crazy, refuses to use that index.

Hi Richard,
there is a thing that i can’t understand. i’ve made a test of creation of a index on (columnx,’a’). I see in my 10.2.0.3 db in USER_INDEXES that this index is Function-based normal. How does Oracle knows that there’s an entry in the index for every record in the table? With FBI has every record in the table an entry in the index?

In the fourth paragraph of your article, you assert that if index is based on columns A and B, then only one of these columns may have null values (let’s calling it as null-column). But at the end of your article only examples with the leading null-column are represented.

Are these two cases not equivalent? What is the reason of lack of consideration of the case when null-column is not leading?

[…] null columns too if index is a concatenated index and the other column has non-null value. Refer to this article from Richard Foote). CBO assumed that only 1 row will be returned and it choose index […]

Q1 : I was expecting it would not let me create TEST_NULLS_I5 and would say that such set of columns already indexed, as I was expecting it would treat a null string same as NULL. What could be the reason behind it ?

Q2 : After running the query it uses the INDEX TEST_NULLS_I4 (for what so ever reason) and returns ZERO rows. If I force it to use FTS, it returns me the correct number of rows.

Note that it makes no real sense to concatenate a null to an index as an index entry that contains nothing but null values is not actually indexed. The whole point of this post was to add a not null constant to the index thereby ensuring all nulls are indeed indexed.

However, although the index is indeed not indexing the empty string as expected, meaning the null index entries are not actually stored in the index, the CBO is incorrectly thinking such values are in the index and is hence using the index in an invalid manner.

For the first part of VKoul’s issue, if you do a
select table_name, column_name, data_type from user_tab_cols where table_name = ‘TEST_NULLS';
You’ll see the ” column was defined as a CHAR, but the NULL was defined as a VARCHAR (both zero length).

The incorrect result, I guess, Oracle getting confused whether a zero length fixed length string is null or not. I can also reproduce the bug in 11gR2

[…] several techniques for utilizing b*tree indexes to locate rows with NULL values in a column. Richard Foote’s blog also has at least one article that describes how to use b*tree indexes to locate NULL values in a […]