The Outer Limits of SQL JOINs

06/27/2001

First off this week, thanks for your patience during the recent breaks in this column -- I'm now the proud father of new baby (our first), and as you may imagine, I needed a little time to get into the swing of balancing new responsibilities with the rest of my life. The good news is that this is just the first of a large stack of new aboutSQL columns which should keep us on track for the next several months.

To refresh everyone's memory, we've started talking about SQL JOIN statements over the past few columns -- one of the most fundamentally important areas of SQL development. In the last column, we talked about the various types of joins, and I promised to cover "outer joins" this time around to complement the discussions of inner and cross joins from the past two columns. So with no further ado....

The SQL outer join

Every type of SQL join effectively multiplies the row in one table by the rows in the other table that is participating in the join (and which can be extended to include as many tables are participating in the JOIN operation).

When we discussed the Cartesian product (the "cross join"), we saw the raw results of a JOIN operation. The INNER JOIN statement provided a useful filter to that raw result by picking out only rows where the key fields have the same value. The result of the INNER JOIN operation is one row for each key value that exists in both joined tables. But what about situations where there are rows in one table that do not have a corresponding match in the other table? An inner join ignores these rows.

An outer join is used to include the rows that are "missing" in an inner join. Using our ongoing CD database as an example may make things clearer. So far, we've created joins that pull our artist data and CD data together to create a catalog of the CD collection. But let's say I've heard a new artist on the radio and immediately decide to put them in the database -- David Gray, for example.

Artists

ArtistID

ArtistName

1

Peter Gabriel

2

Bruce Hornsby

3

Lyle Lovett

4

Beach Boys

5

David Gray

If I create an inner join between the Artists and CD tables, I get the following result.

SELECT * FROM Artists, CDs WHERE Artists.ArtistID=CDs.ArtistID

CDID

CDs.ArtistID

Artists.ArtistID

ArtistName

Title

Year

1

1

1

Peter Gabriel

So

1984

2

1

1

Peter Gabriel

Us

1992

3

2

2

Bruce Hornsby

The Way It Is

1986

4

2

2

Bruce Hornsby

Scenes from the Southside

1990

5

1

1

Peter Gabriel

Security

1990

6

3

3

Lyle Lovett

Joshua Judges Ruth

1992

7

4

4

Beach Boys

Pet Sounds

1966

These results accurately describe my catalog. But what about David Gray and all the others I'm planning on adding to the catalog? They completely disappear from the result because there is no matching row where CDs.ArtistID=5. Big deal you may say, but what if the two tables were Customers and Orders instead? Or maybe Orders and Items?

In the first case, only customers with orders would show up in the report; in the second case, only items that had actually been ordered would show up. This sort of "missing" information could completely ruin some types of calculations! An "outer join" operation will fix this problem.

Using the outer join

Outer joins come in three distinct flavors:

LEFT OUTER JOIN (*=)

RIGHT OUTER JOIN (=*)

FULL OUTER JOIN

Knowing that the purpose of an outer join is to include the "missing" or unmatched rows, you can probably figure out what each of these flavors means. The LEFT, RIGHT, and FULL syntax all describe which of the table's unmatched columns to include. If we wanted to fix the CD collection example, we could change the inner join to the following outer join:

SELECT * FROM Artists, CDs WHERE Artists.ArtistID *= CDs.ArtistID

The LEFT OUTER JOIN operator ensures that all rows on the "left" side of the join, in this case the Artists table, will be included. It is important to note that "left" and "right" are completely dependent on the order of the tables in the SQL statement. The following SQL statements are all identical:

This would count all of the orders for each inventory row, even those with no matching orders (this example assumes that the records in the Orders table contains orders for single inventory items). You could also use this technique to count the orders per customer, inventory per vendor, or any other type of relationship where it is useful to know that some rows in one table have no corresponding values.

Next steps

This week we took a look at the OUTER JOIN operator and some real-world examples. Next week, we jump into final basic type of join -- the self join. Until then, feel free to contact me with your comments and questions. Believe it or not, I read all of the email you send me and answer the vast majority of it! Some of the questions coming in will make it into future columns after we've covered the basics of SQL.