Sub Query

A subquery is a query whose results are passed as the argument for another query. Subqueries enable you to bind several queries together. You will understand and be able to do the following:

• Build a subquery

• Use the keywords EXISTS, ANY, and ALL with your subqueries

• Build and use correlated subqueries

NOTE:The examples for today's lesson were created using Borland's ISQL, the same implementation used on Day 6, "Joining Tables." Remember, this implementation does not use the SQL> prompt or line numbers.

Building a Subquery

Simply put, a subquery lets you tie the result set of one query to another. The general syntax is as follows:

Here you are using the aliases O and P for tables ORDERS and PART to select the five columns you are interested in. In this case the aliases were not necessary because each of the columns you asked to return is unique. However, it is easier to make a readable query now than to have to figure it out later. The first WHERE clause you encounter

WHERE O.PARTNUM = P.PARTNUM

Is standard language for the join of tables PART and ORDERS specified in the FROM clause. If you didn't use this WHERE clause, you would have all the possible row combinations of the two tables. The next section includes the subquery. The statement

AND O.PARTNUM = (SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE "ROAD%")

adds the qualification that O.PARTNUM must be equal to the result of your simple subquery. The subquery is straightforward, finding all the part numbers that are LIKE "ROAD%". The use of LIKE was somewhat lazy, saving you the keystrokes required to type ROAD BIKE. However, it turns out you were lucky this time. What if someone in the Parts department had added a new part called ROADKILL?

The SQL engine complains

multiple rows in singleton select

and you don't get any results. The response from your SQL engine may vary, but it still complains and returns nothing. To find out why you get this undesirable result, assume the role of the SQL engine. You will probably evaluate the subquery first. You would return this:

INPUT:

SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE "ROAD%"

You would take this result and apply it to O.PARTNUM =, which is the step that causes the problem.

ANALYSIS:

How can PARTNUM be equal to both 76 and 77? This must be what the engine meant when it accused you of being a simpleton. When you used the LIKE clause, you opened yourself up for this error. When you combine the results of a relational operator with another relational operator, such as =, < , or >, you need to make sure the result is singular. In the case of the example we have been using, the solution would be to rewrite the query using an = instead of the LIKE, like this:

Using Aggregate Functions with Subqueries

The aggregate functions SUM, COUNT, MIN, MAX, and AVG all return a single value. To find the average amount of an order, type this:

These lines represent the common way of joining these two tables. This join is necessary because the price is in PART and the quantity is in ORDERS. The WHERE ensures that you examine only the join-formed rows that are related. You then add the subquery:

The preceding condition compares the total of each order with the average you computed in the subquery. Note that the join in the subquery is required for the same reasons as in the main SELECT statement. This join is also constructed exactly the same way. There are no secret handshakes in subqueries; they have exactly the same syntax as a standalone query. In fact, most subqueries start out as standalone queries and are incorporated as subqueries after their results are tested.

Nested Subqueries

Nesting is the act of embedding a subquery within another subquery. For example:

SELECT * FROM SOMETHING WHERE ( SUBQUERY(SUBQUERY(SUBQUERY)));

Subqueries can be nested as deeply as your implementation of SQL allows. For example, to send out special notices to customers who spend more than the average amount of money, you would combine the information in the table CUSTOMER

Note the SELECT clause has been modified to return a single column, NAME, which, not so coincidentally, is common with the table CUSTOMER.

Correlated Subqueries

The subqueries you have written so far are self-contained. None of them have used a reference from outside the subquery. Correlated subqueries enable you to use an outside reference with some strange and wonderful results. Look at the following query:

In fact, except for the order, the results are identical. The correlated subquery acts very much like a join. The correlation is established by using an element from the query in the subquery. In this example the correlation was established by the statement

WHERE P.PARTNUM = O.PARTNUM

in which you compare P.PARTNUM, from the table inside your subquery, to O.PARTNUM, from the table outside your query. Because O.PARTNUM can have a different value for every row, the correlated subquery is executed for each row in the query.

This operation returns the DESCRIPTION of every row in PART where P.PARTNUM = O.PARTNUM. These descriptions are then compared in the WHERE clause:

WHERE 'ROAD BIKE' =

Because each row is examined, the subquery in a correlated subquery can have more than one value. However, don't try to return multiple columns or columns that don't make sense in the context of the WHERE clause. The values returned still must match up against the operation specified in the WHERE clause.

This SELECT caused a General Protection Fault on my Windows operating system. The SQL engine simply can't correlate all the columns in PART with the operator =.

Correlated subqueries can also be used with the GROUP BY and HAVING clauses. The following query uses a correlated subquery to find the average total order for a particular part and then applies that average value to filter the total order grouped by PARTNUM:

TIP:When using correlated subqueries with GROUP BY and HAVING, the columns in the HAVING clause must exist in either the SELECT clause or the GROUP BY clause. Otherwise, you get an error message along the lines of invalid column reference because the subquery is evoked for each group, not each row. You cannot make a valid comparison to something that is not used in forming the group.

Using EXISTS, ANY, and ALL

The usage of the keywords EXISTS, ANY, and ALL is not intuitively obvious to the casual observer. EXISTS takes a subquery as an argument and returns TRUE if the subquery returns anything and FALSE if the result set is empty:

INPUT:

SELECT NAME, ORDEREDON FROM ORDERS WHERE EXISTS (SELECT * FROM ORDERS WHERE NAME ='TRUE WHEEL'

ANALYSIS:

Not what you might expect. The subquery inside EXISTS is evaluated only once in this uncorrelated example. Because the return from the subquery has at least one row, EXISTS evaluates to TRUE and all the rows in the query are printed. If you change the subquery as shown next, you don't get back any results.

SELECT NAME, ORDEREDON FROM ORDERS WHERE EXISTS (SELECT * FROM ORDERS WHERE NAME ='MOSTLY HARMLESS')

ANALYSIS:

EXISTS evaluates to FALSE. The subquery does not generate a result set because MOSTLY HARMLESS is not one of your names.

NOTE:Notice the use of SELECT * in the subquery inside the EXISTS. EXISTS does not care how many columns are returned.

You could use EXISTS in this way to check on the existence of certain rows and control the output of your query based on whether they exist. If you use EXISTS in a correlated subquery, it is evaluated for every case implied by the correlation you set up.

Closely related to EXISTS are the keywords ANY, ALL, and SOME. ANY and SOME are identical in function. An optimist would say this feature provides the user with a choice. A pessimist would see this condition as one more complication. Look at this query:

INPUT:

SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME = ANY (SELECT NAME FROM ORDERS WHERE NAME ='TRUE WHEEL')

ANALYSIS:

ANY compared the output of the following subquery to each row in the query, returning TRUE for each row of the query that has a result from the subquery.

INPUT:

SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME = SOME (SELECT NAME FROM ORDERS WHERE NAME ='TRUE WHEEL')

ANALYSIS:

You may have already noticed the similarity to IN. The same query using IN is as follows:

INPUT:

SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME IN (SELECT NAME FROM ORDERS WHERE NAME ='TRUE WHEEL')

ANALYSIS:

As you can see, IN returns the same result as ANY and SOME. Has the world gone mad? Not yet. Can IN do this? ALL returns TRUE only if all the results of a subquery meet the condition. Oddly enough, ALL is used most commonly as a double negative, as in this query:

INPUT:

SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME <> ALL (SELECT NAME FROM ORDERS WHERE NAME ='JACKS BIKE')

ANALYSIS:

This statement returns everybody except JACKS BIKE. <>ALL evaluates to TRUE only if the result set does not contain what is on the left of the <> .

Summary

You learned how to use one of the most important parts of SQL. You also tackled one of the most difficult parts of SQL: a correlated subquery. The correlated subquery creates a relationship between the query and the subquery that is evaluated for every instance of that relationship. Don't be intimidated by the length of the queries. You can easily examine them one subquery at a time.