Inner and Outer Joins (and Cartesian joins)

Joins are operations performed on two sets of data. Using a join we can pull data from one set, associate it to data pulled from another set and then present the resulting joined set to the user and a single data set. For example: suppose we have basic customer information in one table (name, marital status, etc.) and customer addresses in another table. We can pull the basic customer information from one table, pull the address information from another table join them together and present them to the user as a single result set that contains both basic customer info and addresses. At this point you might be asking yourself why we would store the basic customer information in a different table than the customer addresses. That is a good question but won’t be discussed here.

The example I’m using here is for an order tracking system. These tables are extremely scaled down for the purposes of this example but they should suffice in this case.

We have two tables: Orders and ShipMethods. The Orders table contains, well, orders.

The OrderID is the primary key, Shipped is bit column, 0 means the order hasn’t shipped, 1 means it has. ShipMethodID is a foreign key to the ShipMethods table. OrderDesc is self-explanatory.

The ShipMethods table contains the methods by which orders can have been shipped.

The ShipMethodID is the primary key. The ShipMethodID in the Orders table corresponds with a ShipMethodID in this table. We know Order1 was shipped view UPS and Order3 was shipped via Fedex. We also know that USPS and Bob’s Elephant Freight Shipping company, shortened here to Bob’s Freight (what you’ve never needed to ship and elephant??) have not been used for shipping an order because ShipMethodIDs 3 and 4 are not in the Orders table ShipMethodID field.

Using these two tiny tables we can explain the difference between an inner, outer and Cartesian joins.

Remember – on a basic level all a join does it pull data from two different data sets, associate them to one another and present them to user as a single combined data set. How the data is associated is controlled by the join type.

A Cartesian join pulls each row of data from the first table and associates it with each row from of data from the second table. In other words, every row from table one gets associated to every row from table two. Using the Orders and ShipMethods table from above this is the result of a Cartisian join:

As you can see by looking down the list Order1 is associated with each and every ship method. As you can also see UPS is associated with each and every order. The same is true from all Orders and all ShipMethods. This is also known as a cross join in SQL Server. In this case a Cartesian join doesn’t make much business sense. While the join itself is accurate (each row in table one is associated to each row in table two) the business use of this, in this case, is questionable. I put the Cartesian join example first because I think it helps one understand what a join is actually doing and to better understand inner and outer joins.

An inner join is different because it adds a constraint to how the data is joined together. Rather than have every row in one table joined to every row in another table we can use it to constrain what rows in table one are joined to what rows in table two.

The Orders table contains a ShipMethodID column that contains a number which (not coincidentally) matches to a number in the ShipMethodID column from the ShipMethods table.

Using these two columns and an inner join we can pull data from Orders, associate it to data from ShipMethods but only do so where the ShipMethodID values from each row in each table match. You can use the Cartesian join from above two predict what the outcome of the inner join will be by looking at the rows where the ShipMethodIDs are the same. There should be four rows in the result set.

And there they are.

An outer join still constrains how the two data sets are joined together but it is less restrictive. Looking at the results of the Cartesian join you will notice that Order5 and Order6 will never show up in an inner join of Orders with the ShipMethods table. Similarly ship methods USPS and Bob’s Elephant Freight Shipping would not show up either. This is because the ShipOrderIDs for Order5 and Order6 has not been specified and because no Orders specified USPS and Bob’s Elephant Freight Shipping as ship methods. An outer join lessens the constraining nature of the join and will allow those to show up under certain conditions.

There are three basic types of outer joins: left outer join, right outer join and full outer join.

A left outer join will take all of the rows from the left table but only the matching rows from the right table.

You’ll notice that for the rows in the left table for which there are no corresponding records from the right table NULL is returned.

So what is the left table? The query I used to get this result set was:

The left table is Orders. Unlike the inner join which only showed Orders that had a ShipMethodID with a corresponding ShipMethodID in the ShipMethods table this one shows all Orders even ones without a ShipMethodID with a corresponding ShipMethodID in the ShipMethods table.

The left table is now the ShipMethods table. The order the columns are displayed in is different but the data is the same.

Finally there is the full outer join. The full outer join is like having an inner join, left outer join and right outer joined combined. You get all of the records from the two datasets with corresponding ShipMethodID values. You get all of the records from the left table even if there are no corresponding rows in the right table. And you get all of the rows from the right table even if there are no corresponding rows in the left table.