Optimizing Join Orders

The SQL Server Query Optimizer needs to take two important decisions regarding joins: the selection of a join order and the choice of a join algorithm. The selection of a join algorithm is the choice between a nested loops join, merge join or hash join operator, but I will leave that topic for some other post. In this post I will talk about join orders.

A join combines records from two tables based on some common information. But since a join works with only two tables at a time, a query requesting data from n tables must be executed as a sequence of n – 1 joins. Although the results of a query are the same regardless of the join order, the order in which the tables are joined greatly influences the cost and performance of a query. The Query Optimizer must find the optimal sequence of joins between the tables used in the query. Finding the optimal join order is one of the most difficult problems in query optimization and one that has been subject of extensive research since the seventies.

This basically means that the Query Optimizer defines the join order; it is not defined by your query. To see an example run the following query against the AdventureWorks database and choose to display the execution plan by clicking Include Actual Execution Plan.

SELECT FirstName, LastName

FROMPerson.ContactAS C

JOINSales.IndividualAS I

ON C.ContactID = I.ContactID

JOINSales.CustomerAS Cu

ON I.CustomerID = Cu.CustomerID

WHERE Cu.CustomerType =‘I’

It will show the following execution plan

Note that the Query Optimizer is not using the same join order specified in the query; it found a more efficient one. The query asks to join the Contact table with the Individual table, and then join the result with the Customer table, but if you inspect the three clustered index scan operators, the plan is first joining the Customer and Individual tables and then joining the result with the Contact table.

The problem of finding an efficient join order for a query is difficult because of the number of possible permutations that the Query Optimizer needs to analyze. Because of the commutative and associative properties of joins there could be many different possible join orders in a query and this number increases exponentially with the number of tables. In fact, with just a handful of tables the number of possible join orders could be in the thousands or even millions. Obviously, it is impossible for any query optimizer to look at all those combinations: it would take too long.

Queries are represented as trees in the query processor and some names like left-deep, right-deep and bushy trees are commonly used to identify the shapes of the order of joins in these trees. Left-deep and bushy trees for a join of four tables are shown in the next figure.

For example, the left-deep tree shown could be JOIN( JOIN( JOIN(A, B), C), D) and the bushy tree could be JOIN(JOIN(A, B), JOIN(C, D)). Left-deep trees are also called linear trees or linear processing trees. The set of bushy trees includes the sets of both left-deep and right-deep trees. The following table lists the number of possible join orders considering left-deep and bushy trees.

Tables

Left-Deep Trees

Bushy Trees

1

1

1

2

2

2

3

6

12

4

24

120

5

120

1,680

6

720

30,240

7

5,040

665,280

8

40,320

17,297,280

9

362,880

518,918,400

10

3,628,800

17,643,225,600

11

39,916,800

670,442,572,800

12

479,001,600

28,158,588,057,600

The number of left-deep trees is calculated as n! or n factorial, where n is the number of tables in the relation. The factorial is the product of all the positive integers less than or equal to n. For example, 5! or 5 factorial is 5 x 4 x 3 x 2 x 1 or 120. Calculating the number of bushy tress is more complicated and can be calculated as (2n – 2)!/(n – 1)!.

So how does the Query Optimizer analyze all these possible join orders? The answer is: it does not. Analyzing all the possible join orders for a complex query could take a long time so the Query Optimizer must find a balance between the optimization time and the quality of the resulting plan. The Query Optimizer may not perform an exhaustive search and instead uses some heuristics to guide the search process. For example, a common heuristic is to avoid considering bushy tress during optimization, but I will leave some of these details for some other post.

About the author

Benjamin Nevarez is a database professional based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of three books, “High Performance SQL Server”, “SQL Server 2014 Query Tuning & Optimization” and “Inside the SQL Server Query Optimizer” and has also coauthored other books including “SQL Server 2012 Internals”. Benjamin has also been a speaker at many SQL Server conferences and events around the world including the PASS Summit, SQL Server Connections and SQLBits. His blog can be found at http://www.benjaminnevarez.com and he can also be reached on twitter at @BenjaminNevarez.