Why is a Full Table Scan Selected – Will the Execution Plan Provide a Clue?

20062012

June 20, 2012

I have not had much time to respond in OTN threads recently, although I do still occasionally read threads on the forum. I was a little surprised by one of the late responses in one of the recent threads, where one of the responders suggested actually testing the problem with the assistance of execution plans. Additionally, that responder suggested that the earlier responses missed the target. Ouch!

The OP did not provide DDL to create the table or indexes, or DML to populate the table. The SQL statement provided by the OP was apparently looking for rows where the VARCHAR2 column A was blank (as in containing a zero length string). If an index exists on column A, why would the optimizer select to perform a full table scan when attempting to return all rows with a zero length string in column A? (Raise your hand if you know the answer.)

Let’s create a test table with four indexes so that we are able to test some of the theories (or possible causes) that were proposed in the thread:

The above script creates a table with 100,000 rows, with the first 3 columns declared as NOT NULL. Column C1 contains the numbers 1 through 100,000. Column C2 contains the numbers 0 through 9 in a repeating pattern. Column C3 contains the numbers 0 through 10 with most of the rows containing the same value likely closely bunched together (a single row contains the number 10). Column C4 is a simple VARCHAR2 column that is the number 1 through 100,000 padded to six characters using the letter A. The statistics for the table and indexes were collected with a 100% sampling percentage.

Using the test table created by the above script, we might simulate the OP’s SQL statement using the following. One of the last responders in the OTN thread recommended looking at the execution plan, so we will retrieve that also:

SELECT
*
FROM
T1
WHERE
C4='';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

Unfortunately, the two single quotes (”) in the SQL statement are not interpretted as a test for a zero length string as happens on other DB platforms, and as the OP likely intended (see this AskTom thread). The Oracle Database 11.2.0.2 output of the above follows:

So, the Predicate Information section shows NULL IS NOT NULL, which never has a result of TRUE. Notice that the full table scan has a cost of 252, yet plan row 0, which is a grandparent of row 2, has a cost of 1 (costs for parent operations are supposed to include the calculated cost of child operations plus the calculated cost of the work performed at the parent operation). (Raise your hand if you know the answer why plan row 0 has a cost of 1 when plan row 2 has a cost of 252.) We will have to come back to this execution plan oddity later.

For a SQL statement like the following, which should return one row, you would expect the index on column C1 to be used:

So, an index range scan was performed to retrieve the one row from table T1. (Raise your hand if you know the answer why an index range scan operation was selected when every value in column C1 is unique and an equality predicate was used in the WHERE clause.)

For the following SQL statement, where an index on column C2 would likely have a large clustering factor value, should an index be used to retrieve 10% of the rows from the table?

So, once again the optimizer selected to use a full table scan to facilitate the retrieval of 10% of the rows from the table. But why, the clustering factor should be reasonably low. (Raise your hand if you know the answer why a full table scan was selected.)

But wait… the optimizer from another 11.2.0.2 database instance decided differently:

Why did the optimizer in one of the 11.2.0.2 database instances select to use a full table scan operation, while the optimizer in the other database instance selected to use an index range scan operation? (Raise your hand if you know the answer why there is a difference between the two execution plan results.)

—

If we take another look at the execution plan for the SQL statement that simulates the problem experienced by the OP, I wonder if we are able to determine why the parent operation has a lower calculated cost than its child (or grandchild’s) calculated cost?

Is the above execution plan even legal because the SQL statement is essentially looking for NULL values? (Raise your hand if you know the answer.)

—-

OK, put down your hand – the other people in the office are probably laughing at your hand-waving by now. 😉

I wonder if the OP now understands the problem with his SQL statement?

—–

Late Addition June 20, 2012:

Recall that the optimizer selected that a full table scan should be used for the initial SELECT statement. Part 2 of a previous blog article on this blog pointed to an article on another blog, asking whether or not a couple of statements made on the other blog were true regarding whether or not NULL values were ever stored in a B*tree index structure. One of my previous articles seemed to offer a counter-point, that in fact it is possible for a B*tree index to contain NULL values in certain situations.

What might happen if we swap in a bind variable in place of the literal in the initial SELECT statement? If we set the bind variable to have a value of ” (the same value as the literal), will the optimizer select to use a full table scan operation or an index range scan operation? Will the STARTS column in the execution plan contain 0 for one or more of the rows in the execution plan? Let’s test:

Notice that none of the rows in the execution plan have a STARTS value of 0, and that filter(NULL IS NOT NULL) does not appear in the Predicate Information section of the execution plan as had happened when we used a literal with the same value as the bind variable. So, if the OP would like to see an index access path in the execution plan, perhaps he should use bind variables rather than literals? Is the above execution plan more efficient or less efficient than the execution plan with the full table scan operation that was seen when a literal was passed in the SQL statement?

No rows selected, even though a row was just inserted into the table with the same value as what is in bind variable V1. Let’s DUMP the values of a couple of columns from the row that was just inserted:

The above output shows that column C4 of this row contains a NULL value. I suggest that it might not matter whether or not the execution plan shows an index access path or a full table scan for this SQL statement – if the SQL statement is answering a different question than what the OP expects, then it does not matter whether or not the SQL statement executes efficiently.

However, as a bonus we were able to see the optimizer using an index access path to check for a NULL value.

Actions

Information

3 responses

20062012

Tony Sleight(12:37:14) :

Charles,
This is a really great breakdown of a problem.

I have never seen the instance of the filter predicate (NULL IS NOT NULL) before. But, given this information it is hardly surprising the cost of a FTS (252) was ignored as the filter would always be FALSE. That was shown later in your analysis with the starts=0, the FTS was not executed, hence cost being a token value of 1 which I guess is the CPU cost.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: