Introduction to Joins

Introduction to Joins

Joins are one of the most important operations performed by a relational database system. An RDBMS uses joins to match rows from one table with rows from another table. For example, we can use joins to match sales with customers or books with authors. Without joins, we might have a list of sales and customers or books and authors, but we would have no way to determine which customers bought which items or which authors wrote which books.

We can join two tables explicitly by writing a query that lists both tables in the FROM clause. We can also join two tables by using a variety of different sub-queries. Finally, SQL Server may introduce joins for a variety of purposes into a query plan during optimization.

This is the first of several posts that I am planning for joins. In this post, I’m going to start by introducing the logical join operators that SQL Server supports. These are:

Inner join

Outer join

Cross join

Cross apply

Semi-join

Anti-semi-join

Here is a simple schema and data set that I will use to illustrate each join type:

createtable Customers (Cust_Id int, Cust_Name varchar(10))

insert Customers values(1,'Craig')

insert Customers values(2,'John Doe')

insert Customers values(3,'Jane Doe')

createtable Sales (Cust_Id int, Item varchar(10))

insert Sales values(2,'Camera')

insert Sales values(3,'Computer')

insert Sales values(3,'Monitor')

insert Sales values(4,'Printer')

Inner joins

Inner joins are the most common join type. An inner join simply looks for two rows that put together satisfy a join predicate. For example, this query uses the join predicate “S.Cust_Id = C.Cust_Id” to find all Sales and Customer rows with the same Cust_Id:

select*

from Sales S innerjoin Customers C

on S.Cust_Id = C.Cust_Id

Cust_IdItemCust_IdCust_Name

----------- ---------- ----------- ----------

2Camera2John Doe

3Computer3Jane Doe

3Monitor3Jane Doe

Notes:

Cust_Id 3 bought two items so this customer row appears twice in the result.

Cust_Id 1 did not purchase anything and so does not appear in the result.

We sold a ‘Printer’ to Cust_Id 4. There is no such customer so this sale does not appear in the result.

Suppose that we would like to see a list of all sales; even those that do not have a matching customer. We can write this query using an outer join. An outer join preserves all rows in one or both of the input tables even if we cannot find a matching row per the join predicate. For example:

select*

from Sales S leftouterjoin Customers C

on S.Cust_Id = C.Cust_Id

Cust_IdItemCust_IdCust_Name

----------- ---------- ----------- ----------

2Camera2John Doe

3Computer3Jane Doe

3Monitor3Jane Doe

4PrinterNULLNULL

Note that the server returns NULLs for the customer data associated with the ‘Printer’ sale since there is no matching customer. We refer to this row as “NULL extended.”

Using a full outer join, we can find all customers regardless of whether they purchased anything and all sales regardless of whether they have a valid customer:

select*

from Sales S fullouterjoin Customers C

on S.Cust_Id = C.Cust_Id

Cust_IdItemCust_IdCust_Name

----------- ---------- ----------- ----------

2Camera2John Doe

3Computer3Jane Doe

3Monitor3Jane Doe

4PrinterNULLNULL

NULLNULL1Craig

The following table shows which rows will be preserved or NULL extended for each outer join variation:

Join

Preserve …

A left outer join B

all A rows

A right outer join B

all B rows

A full outer join B

all A and B rows

Full outer joins are commutative. In addition, “A left outer join B” and “B right outer join A” are equivalent.

Cross joins

A cross join performs a full Cartesian product of two tables. That is, it matches every row of one table with every row of another table. You cannot specify a join predicate for a cross join using the ON clause though you can use a WHERE clause to achieve essentially the same result as an inner join.

Cross joins are fairly uncommon. Two large tables should never be cross joined as this will result in a very expensive operation and a very large result set.

select*

from Sales S crossjoin Customers C

Cust_IdItemCust_IdCust_Name

----------- ---------- ----------- ----------

2Camera1Craig

3Computer1Craig

3Monitor1Craig

4Printer1Craig

2Camera2John Doe

3Computer2John Doe

3Monitor2John Doe

4Printer2John Doe

2Camera3Jane Doe

3Computer3Jane Doe

3Monitor3Jane Doe

4Printer3Jane Doe

Cross apply

We introduced cross apply in SQL Server 2005 to enable joins with a table valued function (TVF) where the TVF has a parameter that changes for each execution. For example, the following query returns the same result as the above inner join using a TVF and cross apply:

createfunction dbo.fn_Sales(@Cust_Id int)

returns @Sales table(Item varchar(10))

as

begin

insert @Sales select Item from Sales where Cust_Id = @Cust_Id

return

end

select*

from Customers crossapply dbo.fn_Sales(Cust_Id)

Cust_IdCust_NameItem

----------- ---------- ----------

2John DoeCamera

3Jane DoeComputer

3Jane DoeMonitor

We can also use outer apply to find all Customers regardless of whether they purchased anything. This is similar to an outer join.

select*

from Customers outerapply dbo.fn_Sales(Cust_Id)

Cust_IdCust_NameItem

----------- ---------- ----------

1CraigNULL

2John DoeCamera

3Jane DoeComputer

3Jane DoeMonitor

Semi-join and Anti-semi-join

A semi-join returns rows from one table that would join with another table without performing a complete join. An anti-semi-join returns rows from one table that would not join with another table; these are the rows that would be NULL extended if we performed an outer join.

Unlike the other join operators, there is no explicit syntax to write “semi-join,” but SQL Server uses semi-joins in a variety of circumstances. For example, we may use a semi-join to evaluate an EXISTS sub-query:

select*

from Customers C

whereexists(

select*

from Sales S

where S.Cust_Id = C.Cust_Id

)

Cust_IdCust_Name

----------- ----------

2John Doe

3Jane Doe

Unlike the previous examples, the semi-join only returns each customer one time.

There are left and right semi-joins. A left semi-join returns rows from the left (first) input that match rows from the right (second) input while a right semi-join returns rows from the right input that match rows from the left input.

We might similarly use an anti-semi-join to evaluate a NOT EXISTS sub-query.

Miscellaneous notes

In all of the above examples, I used a join predicate that compares whether two columns, one from each table, are equal. This type of join predicate is called an “equijoin.” Other join predicates (such as inequalities) are possible, but equijoins are especially common and SQL Server has many more alternatives when optimizing equijoins than when optimizing joins with more complex predicates.

SQL Server has more flexibility over join order and algorithms when optimizing inner joins than when optimizing outer joins and cross applies. Thus, given two queries that differ only in that one strictly uses inner joins while the other uses outer joins and/or cross applies, SQL Server may be able to find a better plan for the query that uses inner joins only.

Next time …

In my next post, I’ll continue with joins by introducing the physical join operators (or algorithms) that SQL Server uses to implement these logical join operators.