First Table is 550MB, Second Table is 26GB – Nested Loops or Full Table Scan?

A couple of days ago I wrote a blog article about creating test case scripts to test ideas. In a recent OTN thread the original poster asked about why his value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter is not being honored. The OP stated that his DB_FILE_MULTIBLOCK_READ_COUNT parameter was set to 128, yet when reviewing a 10046 trace at level 8 he was seeing that a maximum of 16 blocks were read from disk at a time while perfoming full table scans. So, what might be limiting the multi-block reads to 16 blocks (I believe that one of my previous blog article mentioned the limiting factors)? This would be a great use for a test case script, although with 26GB and 550MB tables you might want to consider reducing the size of the tables when building the test case script.

Interesting, however the unexpectedly low multi-block read size is not the subject of this blog article. Someone mentioned in the OTN thread that when joining the 550MB table to the 26GB table, a nested loops join should be used because hashing could take a lot of time. I was not expecting to see a comment like that, so I started to wonder, could it be true? I wonder if one of my earlier blog articles touched on this idea? Maybe I overlooked something in the documentation:

“The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

11.3.4.1 When the Optimizer Uses Hash Joins
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

A large amount of data must be joined.

A large fraction of a small table must be joined.”

Does the situation change if there is no index on the join columns in either table? Does the situation change if there is no index on any of the predicates in the WHERE clause? Should Oracle Database repeatedly full scan the 550MB or the 26GB table while performing a nested loops join? What is a large fraction of a small table, 75% of the rows, 50% of the rows, 10% of the rows, 1% of the rows? Maybe the use of a nested loops join in such a case needs a bit more consideration. Of course we could create a test case script to determine if this statement in the OTN thread is correct, although we may need to hint the join method in the SELECT statement to test the performance with both join methods.

The same person later suggested that if the first table is small (I guess that the 550MB table is the small one) and the driven table (I suspect that the 26GB table matches this description) contains an index on the join column, Oracle’s optimizer will favor nested loops joins over hash joins. It might be interesting to construct a test case to see if this is true (Oracle Database 11.1.0.7 used for the test case results displayed below). To save time I will create tables which are a bit smaller than the original 550MB and 26GB tables, which might throw off the test case a little in favor of nested loops joins, but I will keep an eye on the output for problems. In my test case script, table T1 will be my “large” table and table T2 will be my “small” table.

We now have two simple tables with a column having an ascending number sequence, a column have a descending number sequence, and a 200 byte padding column to discourage full table scans. Now let’s perform a quick test:

Two full table scans and a hash join. If we retrieve all of the columns and all of the rows, the optimizer apparently will not favor nested loops joins. Let’s try again with a slightly different query, this time using the column with the descending number sequence as the join column:

Still a hash join, although the indexes on both tables were used to reduce the number of rows entering the hash join. So, in this case when selecting 1% (1,000 / 100,000) of the rows from tables, the optimizer still selected to perform a hash join. Let’s try another SQL statement, this time putting the restriction on the column with the descending values (note that this query, if executed, will not return any rows because the values in column T2.COL2 do not fall into the range between 1 to 1,000):

As you can probably tell from the above plan, the optimizer believes that 1 or fewer rows will be returned, so the optimizer changed from performing a single hash join to performing two nested loops joins. Let’s try another query, this time we will not retrieve column T2.COL3 (the large padding column) and change the WHERE clause to pick up all COL1 values less than 10,000:

The optimizer predicted that a single row would be returned, so it decided to use two nested loops joins rather than a hash join. Let’s try again, but this time fix the SQL statement so that it will return rows if executed:

Back to a hash join again, so nothing magic about having a column with a descending sequence of numbers causing the nested loops join. Let’s try again, this time retrieving only 100 rows, rather than 1,000 or 9,999:

Apparently, dropping the number of rows to 10 (0.01% of the rows in table T2, if my calculations are correct) was sufficient to make the optimizer switch from a hash join to two nested loops joins.

I will now ask the question, “If the first table is small and the driven table contains an index on the join column, will Oracle’s optimizer favor nested loops joins over hash joins?” Could someone generate a test case that shows just the opposite of the results found in my test case? Does the answer change if I were to repeat the test case script on Oracle Database 8.1.7.4 or 11.2.0.2? What if instead of one 200 byte padding column there were fifty 10 byte padding columns, and the query only selected one of those padding columns from each table? What if instead of one 200 byte padding column there was only a single 10 byte padding column? Is it safe to make a generalization at this point?

6 responses

Thank you for suggesting another item that could affect the test case results, and another item to consider if we were trying to help the original poster in the OTN thread. For the Oracle instance used in this blog article:

I mentioned the size of the table, or more specifically the column descriptions for a couple of reasons. One of those reasons is how the AVG_ROW_LEN for the table affects the costing of an access path (page 11 seems to show its use: http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf – but there are probably more recent articles) – should we full table scan or index range scan to access the table data. Another reason I mentioned the size of the tables, or more accurately the size of the selected columns, is how much data will be supplied as input to the next operation in the plan – that likely is not an issue demonstrated in my simple test case script.

Getting back to your point, what if the join column was a 100 byte VARCHAR2 column rather than a numeric column, or multiple columns totalling an average of 200 bytes were the join condition. As you state, the amount of memory available for storing the hash table is an important consideration.

Another worthy to mention item that could impact the results of the test case, and another one that did not immediately come to mind when I wrote the blog article. For the Oracle instance used in this blog article:

Another great suggestion for what could cause a change from hash joins to nested loops joins (or nested loops joins to hash joins). I might have to see what happens when experimenting with different system (CPU) statistics. Thank you for the suggestion.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: