A Deeper Look at Advanced Select Queries

To the SQL novice, joins can be intimidating and somewhat frustrating. Even when you think you have the query nailed, along comes an unexpected result which make you throw up your arms in surrender to the SQL gods. In this tutorial, we are going to build upon the excellent quick tip of working with MySQL and INNER JOIN recently posted and crack open the real power of a SELECT statement.

Introduction

Think back to your earliest days of school where you were learning to group items, or learning basic math skills, or even how to spell. If your days of school were anything like mine, there were tons of exercises where there were pictures of things and you grouped those things together. It might have been ducks and cows, where you had to count each. It might have been word searches where you were given a block of seemingly random letters from which you had to find a list of words. You might remember "Which one of these are not like the other" game.

These basic games that everyone learns in their earliest days of school is exactly the same skills that you use when you are working with SELECTs. We just have a different syntax that we have to use. Every time we use a SELECT command we are simply grouping a data set. Using JOINs we are simply grouping multiple data sets. We are finding ducks and cows that are either similar or different and pushing that off into an array that we can work with in our logic layer. It's the syntax that is confounding, but what we are doing is the same thing that we did in our very earliest days of school.

"Every time we use a SELECT command we are simply grouping a data set. Using JOINs we are simply grouping multiple data sets. We are finding ducks and cows that are either similar or different and pushing that off into an array that we can work with in our logic layer. It's the syntax that is confounding..."

I am going to cover a few things in this tutorial, but none are extremely difficult to put into practice. I am using MySQL as my database, but for the most part I will be using simple SELECT syntax that should at least be familiar regardless of your flavor of SQL. For my query tool, I am using Sequel Pro for Mac which is relatively fast and descriptive for my purposes today. I am using a very simple table structure of animal types, farms, and wild. For those playing along at home, the table structure looks something like this:

Back to Basics: Breaking the Shortcut Habit

Since it is the syntax, and not necessarily the concepts that are difficult, we should take a couple of steps back and remember some of the basics. There are many different shortcuts that you can take in writing a SELECT, but many times those shortcuts will cause trouble down the line. I tend to write my queries out, so that I don't have ambiguous column names, and I can see exactly what I am expecting to get.

Let's first start with my animal_types table which has a few animal types entered in. If I wanted to select everything in that table, I could write my select like so:

SELECT *
FROM animal_types

Which would give me a data set of:

The problem with using the " * " selector is that a year from now I probably will not remember what I am expecting from this SELECT. Furthermore, when I am joining tables, I will probably have ambiguous column names, meaning there are multiple columns that have the same name, thus SQL has no way of setting the results apart. Instead, let's be specific, which takes just a little more typing, but corrects some of these problems with taking shortcuts:

SELECT animal_type_id,
animal_type_description
FROM animal_types

Which gives the exact data set back, but I see exactly what I should be getting back:

Even though we are explicitly naming our column names, we can go just a little bit further, and that is explicitly saying these columns belong to this table. This is another shortcut that we take all the time, which adds just a little more time to writing queries, but ends up saving the debugging time later when you are joining other tables. Instead of our last query, let's instead explicitly prefix our columns with a table prefix:

We have added the prefix on the table animal_types of att, which tells SQL, when you see the prefix of u, realize that it belongs to the table named animal_types. Our data set looks exactly the same again:

Let's also tell SQL what to name our columns when it gives us our results. Many times when you are working with JOINs or other advanced queries, two columns will have the same name, which makes it difficult to distinguish in your logic layer. I like to explicitly tell SQL to deliver my column names in the manner that I want, and not necessarily as they are named. To do this, we are going to use the AS keyword:

SELECT att.animal_type_id AS animal_type_id,
att.animal_type_description AS animal_type_description
FROM animal_types att

Which again gives us the exact same data set but we have again been even more explicit in our query:

There is one more way to be sure that I am telling SQL to SELECT exactly what I want, and from where I want, and that is to add the database prefix onto my table. Programming languages and different tools make it very easy for the database name to be implied, but there are times where you might call a different database in a query. When you add the database prefix you take the implication of working with a single database out of the mix.

SELECT att.animal_type_id AS animal_type_id,
att.animal_type_description AS animal_type_description
FROM tutorial.animal_types att

Again, exact same data set:

Except now we are telling SQL to explicitly look in the database "tutorial" for the table "animal_types". Again, this comes in handy when we are pulling information from multiple databases into a single select.

Sub-Selects: Multiple Selects Bundled Together

MySQL 4.1 added the ability to use Sub-Selects. A word of caution on Sub-Selects before we get too deep into them, use them wisely. The execution plan for Sub-Selects may well be much different than you would imagine, and thus when you believe that you are using an INDEX, you are really working with a full table scan. In my experience, you can optimize your Sub-Selects to be just as performance friendly as a natural join, but when you are dealing with optimization you will want to pay close attention to the EXPLAIN results on your query. For more information,
take a look at a recent MySQL optimization tutorial which gives you insight on the query execution plans.

What a Sub-Select is basically executing a SELECT statement within a SELECT statement. Let's take a look at a very common use:

SELECT f.animal_id AS animal_id,
f.animal_type AS animal_type,
f.animal_description AS animal_description,
f.animal_name AS animal_name
FROM tutorial.farm f
WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_id = 2)

Which gives us a data set of:

In this case we are saying to return all farm animals that have are of an animal type 2. Let's build on this just a bit, and instead look for animal type 1 or 2.

SELECT f.animal_id AS animal_id,
f.animal_type AS animal_type,
f.animal_description AS animal_description,
f.animal_name AS animal_name
FROM tutorial.farm f
WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_id = 2 OR att.animal_type_id = 1)

Which now gives us a result of:

We are using the IN keyword to tell SQL to look at the results from our Sub-Select. In this case, we could easily rewrite this to not use a Sub-Select and get the same results:

SELECT f.animal_id AS animal_id,
f.animal_type AS animal_type,
f.animal_description AS animal_description,
f.animal_name AS animal_name
FROM tutorial.farm f
WHERE f.animal_type IN (1, 2)

Exact same results as before:

However, when we change our Sub-Select to be something like:

SELECT f.animal_id AS animal_id,
f.animal_type AS animal_type,
f.animal_description AS animal_description,
f.animal_name AS animal_name
FROM tutorial.farm f
WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_description LIKE "%duck%" OR att.animal_type_description LIKE "%cow%")

We begin to see the power:

We can use a Sub-Select to search and give us a data set to limit our farm results. We can take this just a bit further by also using a Sub-Select in our query SELECT. Say for instance that we want to get the animal_type_description as a Sub-Select in our main query:

SELECT f.animal_id AS animal_id,
f.animal_type AS animal_type,
f.animal_description AS animal_description,
f.animal_name AS animal_name,
(SELECT att.animal_type_description FROM tutorial.animal_types att where att.animal_type_id = f.animal_type) AS description
FROM tutorial.farm f
WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_description LIKE "%duck%" OR att.animal_type_description LIKE "%cow%")

Which returns:

In this query we are placing a condition on our Sub-Select to return the animal_type_description when it matches the f.animal_id. This condition will last for this pass, and will be reset on the next data set. Basically, we have joined our tables together by simply using a Sub-Select. Again, re-read the caution from the beginning of this section on the performance, but as you can see you can use a Sub-Select to quickly join a table based on a condition performed in the Sub-Select.

INNER JOINS: The Exclusive Club

With Sub-Selects out of the way, let's get to an INNER JOIN. I have always seen joins taught with nice looking circles which show where the JOIN is selecting. I would rather think of joins in a more natural learning way which I learned in grade school of grouping pictures. I always visualize what I am wanting to get from my query before I write it. When I visualize an exclusive relationship, that's when I use an INNER JOIN.

For instance, let's say that I want to see the animal type description for my farm animals, but only when an animal type id exists. I could write an INNER JOIN that would return this result:

Awesome. That said, this example does not show the exclusivity that we get from our INNER JOIN. Let's do something that happens every day with our data structure and delete an entry from our definition look up table, animal types. I am going to delete the "cat" description:

Again, always remember an INNER JOIN is going to return data from two tables where there is a match in both of the tables. When you compare two pictures and make a single picture from the two, you will only be drawing the matches.

OUTER JOINS: The Descriptive Club

INNER JOINs are easy to use, but can be limiting. There are two types of joins with MySQL which do not have the exclusivity rules, and they are LEFT OUTER JOIN and RIGHT OUTER JOIN. Other SQL databases also have a FULL OUTER JOIN, which you can simulate in MySQL with a UNION. I tend to exclusively use LEFT OUTER JOINs, as I think of my OUTER JOINS as adding description to my SELECT statements.

When you are using a LEFT OUTER JOIN you are basically saying, get me everything from the LEFT table and join anything that matches the condition from the RIGHT table non-exclusively:

Our farm table is our LEFT table as that is where my natural selection comes from, and it will always return a result from the LEFT table, even when there is nothing that matches from our RIGHT table. Let's take a look at an example of a RIGHT OUTER JOIN using the same basic SELECT:

Which in my mind is not nearly as useful as a data set, because I end up will NULL entries in my LEFT data set. When I write a query, I generally want to know everything in my LEFT without those nasty NULL entries to deal with. Where OUTER JOINS really come in handy is when we are joining multiple tables together to build upon the information. Lets add a new look up table to the mix for our descriptions of whether the animal is striped, spotted or solid:

We are saying give me all farm animal names, and while you are at it, give me the type of animal it is, as well as the description of the animal, even when there might not be a match for the farm animal in the RIGHT tables:

Bam! We have what we are after. The only limit to a join is your imagination and the performance of your database server. The more that you normalize, the more that you will probably have the need to add non-exclusive description to you results.

Combining JOINS: Mix and Match

There are times where you need exclusivity on a join, but want description from another join. There is no law (except in Iceland, parts of Nova Scotia, and Eastern Kentucky) that states that you can not use both INNER joins and OUTER joins in the same SELECT. Let's take a look at our wild animals for a moment:

We have one animal that is striped. Let's remove striped from our new animal description table, and then just use LEFT OUTER JOINs to simulate our last example from the wild side:

Which gives us a NULL value where our Striped animal description used to be:

That's all well and good, until you realize that your data structure is no longer intact because of that NULL value. Let's combine our OUTER join description with the exclusivity of an INNER JOIN on our description:

Which gets rid of that NULL value, which might be the difference of a bug in your application because of a data integrity issue, and peace and quiet on the weekend:

We can also self join a table which comes in handy from time to time. What we are doing when joining tables together is getting information from table.a and combining it table.b. SQL doesn't really care if table.b is the same table as table.a. Here's an example:

In this example, we are combining the same wild table to itself on animal id to get restate it's name:

While the result might not be that interesting in this example, think about how many times in your applications that you have created multiple tables to express a parent to child relationship. You could easily cut out the multiple table structure by joining a table on itself to change the meaning of the relationship.

Unions: A Different Way to Join

Even with INNER, OUTER, Sub-Selects tools at our disposal, we find ourselves needing even further tools to get the right data set at times. Sometimes we need to get information about wild animals as well as our farm animals. We have basically three pictures of ducks and cows that need to be combined into a single picture. Enter a UNION.

A UNION is the combination of multiple queries which combine our data sets into one. Let's take the situation where we want a description of all animals, whether farm or wild. We can simply express that using a UNION:

In this UNION the one thing that I have not explained is the explicit column "domain". I am merely saying either give me "Farm" or "Wild" depending on when you are running. I am also using UNION ALL instead of UNION as I want all results and not just distinct values. Our complex query gives us the expected data set that we were hoping for:

Basically saying give me a union of the two queries, but only give me the results in the wild side that have an animal id less than or equal to 3. The produced data set is just as expected:

Unions are very easy to use, but a couple of notes. The first side of the UNION will determine the name of the column. In addition, both sides of the UNION must have the identical column numbers. You can't have one query with 4 columns of data with the other query with 3.

Simulating a FULL OUTER JOIN: Practical Use

We can tie up everything that we have learned by simulating a FULL OUTER JOIN which as I stated above MySQL does not support by using a simple UNION statement. Let's use the original query from the beginning of the tutorial:

In this case I am using a UNION statement instead of the UNION ALL statement as I just want distinct values. Here's my result set:

Conclusion

Even with my fingers getting callouses from the amount of typing to explain advanced queries, I really have really only scratched the surface. I have done my best to explain joins without the same illustrations of circles showing the affective areas. Those illustrations work, but I think if you lean on your grade school experiences and think "Which one of these is not like the other" you are putting a little more context on the idea of joining data sets and tables. If you visualize your SELECT statements it becomes so much easier to get that result that you have in mind. If I have missed anything, just let me know in the comment section.

John is fascinated with innovation of all types. He works by day as a technical business consultant identifying and solving challenging business problems with technology, and by night as a husband and father of two. He has been on the frontier of web development for the past fifteen years, but still learns something new everyday. You can catch up with him through Wyome.com.