Nested Loop, Hash and Merge Joins

When I was explaining some of my developers about the SQL Server Execution Plan, we came across the different types of joins SQL Server performs to optimize data retrieval. They are Nested Loop Join, Hash Join, Sort Merge Join. Interesting questions came up, what are the difference between them and how SQL Server determines which algorithm to use. So here is my take on explaining those questions.

Nested Loop Join (or NL Join)

A Nested Join compares each row from the Outer table to each row from the Inner table looking for the rows which satisfy the Join predicate.

For Each Row CR in Customer For Each Row SR in Sales IF CR join with SR Then Return (CR, SR)

The total number of rows compared and the cost of this algorithm is proportional to the size of the outer table multiplied by the size of the inner table. Since this cost grows quickly as the size of the input tables grow, in practice we try to minimize the cost by reducing the number of inner rows that we must consider for each outer row.

In this case the outer table is Sales.Store and the inner table is Sales.SalesPerson. As you can see there is a clustered index on column BusinessEntityID on Sales.SalesPerson table and Clustered Index on BusinessEntityID on Sales.Stores table. In this the SQL optimized determines that the optimal execution is to take SalesPerson as Outer table and Stores as inner table. Thus it performs an Clustered Index Scan on the Sales.SalesPerson table and Clustered Index Seek on the Stores table.

Notice that the index seek depends on A. BusinessEntityID which comes from the outer table of the join, which in this plan is SalesPerson. Each time we execute the index seek, A. BusinessEntityID has a different value. We refer to A. BusinessEntityID as a “correlated parameter”. If a nested loops join has correlated parameters, it is outputted in the plan as “OUTER REFERENCES.” This type of nested loops join where we have an index seek that depends on a correlated parameter is referred as an “index join.”

There are 3 variants of nested Join. In the simplest case, the search scans an entire table or index; this is called a naive nested loops join. If the search exploits an index, it is called an index nested loops join. If the index is built as part of the query plan (and destroyed upon completion of the query), it is called a temporary index nested loops join.

All these variants are considered by the query optimizer. A nested loops join is particularly effective if the outer input is quite small and the inner input is pre-indexed and quite large. In many small transactions, such as those affecting only a small set of rows, index nested loops joins are far superior to both merge joins and hash joins. In large queries, however, nested loops joins are often not the optimal choice.

Merge Join

The merge join requires that both inputs be sorted on the merge columns, which are defined by the equality (WHERE) clauses of the join predicate. The query optimizer typically scans an index, if one exists on the proper set of columns, or places a sort operator below the merge join. In rare cases, there may be multiple equality clauses, but the merge columns are taken from only some of the available equality clauses.

Because each input is sorted, the Merge Join operator gets a row from each input and compares them. For example, for inner join operations, the rows are returned if they are equal. If they are not equal, whichever row has the lower value is discarded and another row is obtained from that input. This process repeats until all rows have been processed.

The full operation is done as shown below.

Get First Row T1R from Table1
Get First Row T2R from Table2
While Not End of Table1 and Not End of Table2
If T1R joins with T2R
Get Next Row from Table2
Returns (T1R, T2R)
Else T1R < T2R
Get NEXT Row from Table1
Else
Get Next Row from Table2
End Loop

the below statement in SQL Server Management Studio, and look at the execution plan.

The merge join operation may be either a regular or a many-to-many operation. A many-to-many merge join uses a temporary table to store rows. If there are duplicate values from each input, one of the inputs will have to rewind to the start of the duplicates as each duplicate from the other input is processed. If a residual predicate is present, all rows that satisfy the merge predicate will evaluate the residual predicate, and only those rows that satisfy it will be returned.

Merge join itself is very fast, but it can be an expensive choice if sort operations are required. However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm.

Hash Join

The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

Hash joins are used for many types of set-matching operations: inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference. Moreover, a variant of the hash join can do duplicate removal and grouping (such as SUM(salary) GROUP BY department). These modifications use only one input for both the build and probe roles.

Similar to a merge join, a hash join can be used only if there is at least one equality (WHERE) clause in the join predicate. However, because joins are typically used to reassemble relationships, expressed with an equality predicate between a primary key and a foreign key, most joins have at least one equality clause. The set of columns in the equality predicate is called the hash key, because these are the columns that contribute to the hash function. Additional predicates are possible and are evaluated as residual predicates separate from the comparison of hash values. The hash key can be an expression, as long as it can be computed exclusively from columns in a single row. In grouping operations, the columns of the group by list are the hash key. In set operations such as intersection, as well as in the removal of duplicates, the hash key consists of all columns.

In-Memory Hash Join

The hash join first scans or computes the entire build input and then builds a hash table in memory. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. This build phase is followed by the probe phase. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key’s value is computed, the corresponding hash bucket is scanned, and the matches are produced.

Grace Hash Join

If the build input does not fit in memory, a hash join proceeds in several steps. Each step has a build phase and probe phase. Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. The number of such files is called the partitioning fan-out. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. The hash join is then applied to each pair of partitioned files.

Recursive Hash Join

If the build input is so large that inputs for a standard external merge sorts would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

Note: If the build input is larger but not a lot larger than the available memory, elements of in-memory hash join and grace hash join are combined in a single step, producing a hybrid hash join.

It is not always possible during optimization to determine which hash join will be used. Therefore, SQL Server starts using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.

If the optimizer anticipates wrongly which of the two inputs is smaller and, therefore, should have been the build input, the build and probe roles are reversed dynamically. The hash join makes sure that it uses the smaller overflow file as build input. This technique is called role reversal.

Conclusion

SQL Server Optimizer determines the best, optimized plan for executing the query. The optimizer evaluate the indices on the table columns, number of rows in the tables , type of joins etc. when it chose the best plan.

Nested loop is generally chosen for smaller tables and if it is possible to do Index seek on the inner table to ensure better performance.

Merge Join is considered to be more efficient for large tables with the keys columns sorted. It only need to read the keys once, and does not require lot of CPU cycles.

Hash Join is also better suited for large tables. This requires less IO, but need more CPU and requires lot of memory.

That being said, it is possible to override the optimizer’s choice of the join by using the OPTION hint. However, the optimizer is very smart, it does not make very many bad choices. So use caution when you use the OPTION hint in your queries. Updating statistics to fix your indexes help the optimizer to choose the right Join.