While a table join combines multiple
tables into a new table, a subquery (enclosed in parentheses) selects rows
from one table based on values in another table. A subquery, or inner query,
is a query-expression that is nested as part of another query-expression.
Depending on the clause that contains it, a subquery can return a single value
or multiple values. Subqueries are most often used in the WHERE and the HAVING
expressions.

A single-value
subquery returns a single row and column. It can be used in a WHERE or HAVING
clause with a comparison operator. The subquery must return only one value,
or else the query fails and an error message is printed to the log.

This query uses a subquery in its WHERE clause to select U.S. states
that have a population greater than Belgium. The subquery is evaluated first,
and then it returns the population of Belgium to the outer query.

proc sql;
title 'U.S. States with Population Greater than Belgium';
select Name 'State' , population format=comma10.
from sql.unitedstates
where population gt
(select population from sql.countries
where name = "Belgium");

Internally, this is what the query looks like after the subquery has
executed:

proc sql;
title 'U.S. States with Population Greater than Belgium';
select Name 'State', population format=comma10.
from sql.unitedstates
where population gt 10162614;

The outer query lists the states whose populations are greater than
the population of Belgium.

U.S. States with Population Greater than Belgium
State Population
-----------------------------------------------
California 31,518,948
Florida 13,814,408
Illinois 11,813,091
New York 18,377,334
Ohio 11,200,790
Pennsylvania 12,167,566
Texas 18,209,994

A
multiple-value subquery can return more than one value from one column. It
is used in a WHERE or HAVING expression that contains IN or a comparison operator
that is modified by ANY or ALL. This example displays the populations of oil-producing
countries. The subquery first returns all countries that are found in the
OILPROD table. The outer query then matches countries in the COUNTRIES table
to the results of the subquery.

proc sql outobs=5;
title 'Populations of Major Oil Producing Countries';
select name 'Country', Population format=comma15.
from sql.countries
where Name in
(select Country from sql.oilprod);

The
previous subqueries have been simple subqueries that are self-contained and
that execute independently of the outer query. A correlated subquery
requires a value or values to be passed to it by the outer query. After the
subquery runs, it passes the results back to the outer query. Correlated subqueries
can return single or multiple values.

This example selects all major oil reserves of countries on the continent
of Africa.

The outer query selects the first row from the OILRSRVS table and then
passes the value of the Country column, Algeria
, to the subquery. At this point, the subquery internally looks
like this:

(select Continent from sql.countries c
where c.Name = 'Algeria');

The subquery selects that country
from the COUNTRIES table. The subquery then passes the country's continent
back to the WHERE clause in the outer query. If the continent is Africa, then
the country is selected and displayed. The outer query then selects each subsequent
row from the OILRSRVS table and passes the individual values of Country to
the subquery. The subquery returns the appropriate values of Continent to
the outer query for comparison in its WHERE clause.

Note that the WHERE clause uses an =
(equal) operator. You can use an =
if the subquery returns only a single value. However, if the
subquery
returns multiple values, then you must use IN or a comparison operator with
ANY or ALL. For detailed information about the operators that are available
for use with subqueries, see the section about the SQL procedure in the
Base SAS Procedures Guide.

The EXISTS condition tests for the existence of a set of
values.
An EXISTS condition is true if any rows are produced by the subquery, and
it is false if no rows are produced. Conversely, the NOT EXISTS condition
is true when a subquery produces an empty table.

This example produces the same result as Correlated Subquery. EXISTS checks for the
existence of countries
that have oil reserves on the continent of Africa. Note that the WHERE clause
in the subquery now contains the condition Continent = 'Africa'
that was in the outer query in the previous example.

Subqueries
can be nested so that the innermost subquery returns a value or values to
be used by the next outer query. Then, that subquery's value or values are
used by the next outer query, and so on. Evaluation always begins with the
innermost subquery and works outward.

This example lists cities in Africa that are in countries with major
oil reserves.

The innermost query is evaluated first. It returns countries
that are located on the continent of Africa.

The outer subquery is evaluated. It returns a subset of African
countries that have major oil reserves by comparing the list of countries
that was returned by the inner subquery against the countries in OILRSRVS.

Finally, the WHERE clause in the outer query lists the coordinates
of the cities that exist in the WORLDCITYCOORDS table whose countries match
the results of the outer subquery.

proc sql;
title 'Coordinates of African Cities with Major Oil Reserves';
select * from sql.worldcitycoords
3 where country in
2 (select Country from sql.oilrsrvs o
where o.Country in =
1 (select Name from sql.countries c
where c.Continent='Africa'));

You
can combine joins and subqueries in a single query. Suppose that you want
to find the city nearest to each city in the USCITYCOORDS table. The query
must first select a city A, compute the distance from city A to every other
city, and finally select the city with the minimum distance from city A. This
can be done by joining the USCITYCOORDS table to itself (self-join) and then
determining the closest distance between cities by using another self-join
in a subquery.

This is the formula to determine the distance between coordinates:

SQRT(((Latitude2-Latitude1)**2) + ((Longitude2-Longitude1)**2))

Although the results of this formula are not exactly accurate because
of the distortions caused by the curvature of the earth, they are accurate
enough for this example to determine whether one city is closer than another.

The outer query joins the table to itself and determines the distance
between the first city A1 in table A and city B2 (the first city that is not
equal to city A1) in Table B. PROC SQL then runs the subquery. The subquery
does another self-join and calculates the minimum distance between city A1
and all other cities in the table other than city A1. The outer query tests
to see whether the distance between cities A1 and B2 is equal to the minimum
distance that was calculated by the subquery. If they are equal, then a row
that contains cities A1 and B2 with their coordinates and distance is written.