Reading 100% of Data Via An Index (We Are Hungry Men) May 20, 2008

I’ve already previously discussed how a Full Table Scan can sometimes be the most effective execution path for reading a relatively small proportion of all rows (< 1%). Thought I might quickly discuss how an index can be the most effective execution path for reading a relatively high proportion of rows.

Such as 100% of all rows.

There is no magic number or proportion from which Oracle will somehow magically switch to using a FTS. If an index access path has a lower cost, it will be selected over the more expensive FTS and the associated percentage of rows retrieved can potentially be anywhere between 0% and 100% of all rows.

Typically, reading a large percentage of rows is considered the exclusive domain of the Full Table Scan, however there are various scenarios in which the costs of reading all rows via an index is actually the cheaper alternative. Here are just a few examples.

Perhaps the most obvious example is when we’re only interested in columns that can all be found within an index. In this case, if the index segment is smaller than the parent table segment (which in most cases it is), then Oracle can treat the index as being some kinda “skinny” version of the table and perform a multi-block Index Fast Full Scan without having to subsequently visit the table segment at all.

This simple Index Fast Full Scan Demo shows how a query that only references columns in an index can use the index to retrieve 100% of all rows in the table.

Another example is where the table segment is very poorly fragmented with lots of deleted space. A FTS will need to read all table blocks below the High Water Mark (HWM), including potentially many mostly or totally empty table blocks. If these costs are excessive and the remaining rows can be more efficiently accessed via an index, in extreme cases it might be more efficient to read all rows via the index than via a FTS. Yes, the table segment should probably be reorganised via say a Move or Shrink command if the deleted space is not going to be reused any time soon, however until this has been performed, it could very well be more efficient to access 100% of data via an index.

This extreme Poorly Fragmented Table Demo highlights how an index can be most efficient in retrieving 100% of all rows in a table, if the table is badly fragmented with lots of deleted space.

Yet another example is when the index could be used to avoid a possible sort. Index entries are always stored in the order of the indexed columns (except for Reverse Key Indexes). Therefore by reading the data via an index, all data will be retrieved in the order of the index. If this order matches the specific required order due to a ORDER BY clause, then Oracle does not need to perform the sort operation. In some cases, especially when the index has an excellent Clustering Factor, it might be more efficient to retrieve 100% of all data via an index and avoid the sort than use a FTS followed by a sort.

This tiny extract from my index internals seminar shows a simple Index and Sort Demo whereby the CBO decides to use the index to retrieve 100% of all data as it prevents Oracle from having to perform an expensive sort.

You begin to get the idea …

In summary, an index can of course can be most effective when retrieving just the 1 row (0 rows even) but it can also be most effective when retrieving up to 100% of all rows in a table as these simple examples illustrate.

The next time someone asks at what point or percentage will Oracle no longer consider using an index, we now all know there is no magic number and that it all entirely depends on many many factors which ultimately determine the relative costs of all possible access paths.

@Robert – Microsoft’s version of the Index Organized Table… :-) but with the disadvantage of storing both the index and the table, but at least you know it is an index (which is a whinge about finding the partitions in an Oracle partitioned index organized table…)

@Peter, there is a much better approximation to an IOT in SQL Server: if you define a clustered index on a table (there can be at most one) the table is converted into a BTree. In this case there is only the index and leaf entries contain the data – there is no additional storage. Downside is that additional indexes will have to store the index keys instead of the row id.

“The INCLUDE clause is really a unique and IMHO useful feature”
I can see the benefit where you can create an index that is unique on, say, one column but includes a second column to save on table lookups. But in Oracle this can be done with a unique constraint enforced by a nonunique index.
For a nonunique index, I don’t see much benefit. If Oracle did implement it, I guess it would mean the index entries being stored in order of column1, rowid instead of column1, column2, rowid. If column2 were regularly updated, it may mean that the index entry doesn’t move as much, so may offer a benefit in some situations.

Gary of course you can achieve similar results in Oracle. But, please do not forget documentation: with the INCLUDE clause it is crystal clear which columns are key columns and which columns are there for optimization purposes only. It is just more expressive.

using 10.2.0.3 on linux when I run the fast full scan demo as is I get the same result as Richard, changing the primary key to a unique index results in a full tablescan, any ideas why? the output I get is.

Knew it had to be something simple, must of been having a slow brain day, I’ve even just spent time in the last day or two explaining how to index nulls using a function based index to a developer. Forgot that just because a column doesn’t contain any null values doesn’t mean it can’t unless the constraint is there, thanks for that.