Filtering (WHERE) and Sorting Data (ORDER BY) in Oracle 9i

This chapter covers filtering of rows using the WHERE clause and sorting of retrieved data using the ORDER BY clause. The WHERE clause applies to both queries and most DML commands; the ORDER BY clause applies to queries only.

The use of LIKE, IN,EXISTS, BETWEEN, ANY, SOME, and ALL comparison conditions

Logical condition precedence: (), NOT, AND, OR

NULL values and sorting

Sorting methods

This chapter covers filtering of rows using the WHERE clause and sorting of retrieved data using the ORDER BY clause. The WHERE clause applies to both queries and most DML commands; the ORDER BY clause applies to queries only.

Filtering with the WHERE Clause

The WHERE clause extends the syntax of the SELECT
statement, allowing filtering of rows returned from a query.

NOTE

A WHERE clause is applied to a query during the initial reading
process, regardless of whether reading of rows involves I/O, reading from
database buffer cash, or both. Other clauses, such as the ORDER BY
clause, are applied after all rows are retrieved. Thus, applying WHERE
clause filtering first limits the number of rows sorted by an ORDER BY
clause, for instance. Proper use of WHERE clause filtering is good coding
practice, leading to better-performing queries in general.

WHERE Clause Syntax

The preceding chapter examined the basics of the SELECT statement
with respect to retrieving data from the database. The basic SELECT
statement is made up of a SELECT clause, with a list of items to be
retrieved, plus a FROM clause. In its simplest form the SELECT
clause specifies columns in a table, and the FROM clause specifies the
table from which column values are to be selected.

The WHERE clause allows inclusion of wanted rows and filtering out
of unwanted rows. The syntax for the WHERE clause is shown in Figure
3.1.

The WHERE clause is an optional addition to a SELECT
command. The WHERE clause can also be used in both UPDATE and
DELETE DML commands.

The WHERE clause is used to filter out unwanted rows from the
resulting row set or retain required rows.

The WHERE clause in its simplest form is a simple comparison
between two expressions. An expression can be a simple column, or include schema
and table or view names, aliases, and even another expression.

The WHERE clause is shown in the preceding example and in the
following example such that the two expressions RANK and 1000
are compared using the comparison condition <. RANK is a
column in the MOVIE table and 1000 is an expression:

WHERE RANK < 1000

It follows that both sides of the comparison condition can be table
columns:

SELECT TITLE, RANK, REVIEW_RANK FROM MOVIE WHERE RANK > REVIEW_RANK;

Additionally, both sides of the comparison condition can be expressions:

SELECT TITLE, RANK, REVIEW_RANK FROM MOVIE WHERE RANK/100 > 0.5;

There are a multitude of conditions, and it is necessary to examine all
possible comparison conditions available for use in the WHERE
clause.

Comparison Conditions

Comparison conditions allow for two expressions to be compared with each other in various ways. These different methods of comparison depend on the comparison condition used, as listed here:

Equi (=), Anti (!=, <>), and Range (<, >, <=, >=). Equi implies equality (=) between two expressions that are being compared. Anti implies that two expressions being compared are not equal to each other (!= or <>). Range implies that one expression is greater than (>), less than (<), less than or equal to (=>), or greater than or equal to (>=).

[ NOT ] LIKE.LIKE uses special wild card characters performing pattern matching between expressions. The % (percentage) character attempts to match zero or more characters in a string, and the _ (underscore) character matches exactly one character in a string.

NOTE

The underscore character _ is also known as the underbar character.

Syntax:

<column> | <expression> LIKE <column> | <expression>

For example, this query finds all movies with the vowel e anywhere in the movie title:

SELECT TITLE FROM MOVIE WHERE TITLE LIKE '%e%';

The next query finds all movies beginning with a capital letter A:

SELECT TITLE FROM MOVIE WHERE TITLE LIKE 'A%';

This query finds only movies with the vowel e in the second character position of their title:

SELECT TITLE FROM MOVIE WHERE TITLE LIKE '_e%';

In contrast, the following query finds all movies without the vowel e in the second character position of their title:

SELECT TITLE FROM MOVIE WHERE TITLE NOT LIKE '_e%';

NOTE

Queries using strings in expressions are case-sensitive, and thus uppercase A is different from lowercase a just as uppercase E is different from lowercase e.

[ NOT ] IN.IN is used to test for membership of an expression within a set or list of expressions.

The second of the preceding two queries uses the AND logical condition. Logical conditions (conjunctions) are discussed in the next section.

This next example will produce no rows at all because there is no such range beginning at 1000, counting upwards to 900:

SELECT TITLE, RANK FROM MOVIE WHERE RANK BETWEEN 1000 AND 900;

CAUTION

This is a potential trick question: BETWEEN 1000 AND 900 is invalid but does not cause an error in SQL*Plus.

This example finds rows between a range of string values:

SELECT NAME, TYPECAST FROM ACTOR WHERE TYPECAST BETWEEN 'Odd' AND
'Shakespearian';

NOTE

Oracle calls BETWEEN a range condition.

ANY, SOME, and ALL. These comparisons all check an expression for membership in a set or list of elements. ANY and SOME are the same and allow a response when any element in the set matches. ALL produces a response only when all elements in a set match an expression.

Both of the following two examples will produce the same result, returning all movies made in the years 1998, 1999, and 2000:

SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = ANY(1998, 1999, 2000);
SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = SOME(1998, 1999, 2000);

This example would produce no result because there are no movies made in all three of the years 1998, 1999, and 2000:

SELECT TITLE, YEAR FROM MOVIE WHERE YEAR = ALL(1998, 1999, 2000);

IS [ NOT ] NULL.NULL values are tested for using the IS NULL comparison. In testing for NULL, IS NULL implies equal to NULL and IS NOT NULL implies not equal to NULL. In other words, = NULL and != NULL are both syntactically invalid and will cause errors.

Syntax:

<column> | <expression> IS [ NOT ] NULL

The following two queries will include only NULL values and exclude only NULL valued TYPECAST actors, respectively:

SELECT GENDER, TYPECAST, NAME FROM ACTOR
WHERE TYPECAST IS NULL ORDER BY GENDER, TYPECAST;
SELECT GENDER, TYPECAST, NAME FROM ACTOR WHERE TYPECAST IS NOT NULL
ORDER BY GENDER, TYPECAST;

NOTE

The function NVL(<expression>, <replace>) is used to replace NULL expressions with the replacement value (see Chapter 5, "Single Row Functions"). The SQL*Plus environmental setting SET NULL has the same effect in SQL*Plus (see Chapter 9, "SQL*Plus Formatting").

Caution - Make sure you understand all comparison conditions. The preceding list covers WHERE clause comparison conditions. You can also join multiple comparisons together using conjunctions, otherwise known as logical conditions.

Different pairs of one or more expressions can be linked together in the
same WHERE clause using the logical operators NOT,
AND and OR.

NOT has highest precedence (is processed first), followed by
AND and then OR.

Logical conditions can be used to form conjunctions or concatenations between multiple comparisons in a WHERE clause. There can be any number of comparison conjunctions. As shown in Figure 3.2 there are two logical conditions: AND and OR. Both AND and OR can also have the optional NOT clause applied, resulting in the opposite.

NOTE

NOT by itself, as well and AND and OR, is sometimes classified as a logical condition, even though it only reverses AND and OR. AND requires that both of two comparison expressions must be true for a true result. OR requires that only one of two comparison expressions must be true for a true result.

The following lines are syntax examples of AND and OR:

AND requires that both <expression1> and
<expression2> are true:

<expression1> AND <expression2>

OR requires that either <expression1> or <expression2> is true:

<expression1> OR <expression2>

CAUTION

Make sure you understand the use of AND, OR, and NOT logical conditions.

The precedence of logical conditions by default is first left to right, followed by NOT, AND, and finally OR.

Precedence is explained from a mathematical perspective in Chapter 4, "Operators, Conditions, Pseudocolumns, and Expressions." Additionally, the rules of precedence apply where parentheses (round brackets) can be used to change the order of resolution of an expression, or increase the precedence of a bracketed part of an expression. Thus, the use of parentheses can change the order of evaluation of NOT, AND, and OR. The term precedence means that one part of an expression is forced to be executed before other parts.

The following syntax demonstrates precedence further.
<expression1> is evaluated first, followed by
<expression2> and finally <expression3>:

<expression1> OR <expression2> AND <expression3>

In the next example, <expression1> is still evaluated first,
but it is compared using OR with the result of
<expression2> and <expression3>, not simply
<expression2>, followed by a spurious AND conjunction
with <expression3>:

<expression1> OR (<expression2> AND <expression3>)
Questions on the precedence of logical conditions using parentheses are very
likely. This simple example returns movies with regular rankings of greater than
1000 that have a review ranking of greater than 4, two different types of
rankings:

Note in Figure 3.3 how the two different queries retrieve different numbers
of rows. This is a direct result of the use of parentheses, changing the
precedence (sequence of evaluation) of the logical conditions AND and
OR.

CAUTION

Don't get confused! Comparison and logical conditions are sometimes
known as comparison and logical operators. Oracle documentation uses the terms
comparison and logical conditions.

The Importance of Precedence

The syntax

<expression1> OR <expression2> AND <expression3>

evaluates differently from this:

<expression1> OR (<expression2> AND <expression3>)

The importance of precedence generally determines that a clause without
proper precedence such as p OR q AND r will produce a spurious result. On
the other hand, p OR (q AND r) forces q AND r to be evaluated
before comparison with p. This implies that p OR the result of
q AND r produces a true response. For example, if p=round,
q=large, and r=four-sided, then testing for a large rectangle will
succeed, and correctly so. On the other hand, p OR q AND r will fail
because a large four-sided rectangle cannot possibly be both round and
four-sided. Mathematically, p OR (q AND r) implies that either p
is true or the combination of q AND r is true; p does not have to
be true. On the contrary, p OR q AND r effectively implies (p OR q)
AND r, a completely different expression, stating that q can be false
if r is true and the expression will still yield a true result, which is
false. Fascinating, huh?

Top-N Queries

When database tables become extremely large, making estimates is sometimes
best done using a simple sampling method. Top-N queries can provide a measure of
sampling to avoid regularly reading millions of rows to answer simple questions.
This can be achieved using what is called an inline view and a pseudocolumn
called ROWNUM (see Chapter 4).

A ROWNUM simply returns the sequence number for each row returned in
a query, in the order in which rows are returned by that query. Thus, the 1st
row has a ROWNUM value of 1 and the 10th row has a
ROWNUM value of 10.

An inline view is a type of subquery in which the subquery is embedded in the
FROM clause of a calling query (see Chapter 8).

The following query is a Top-N query. All movie titles are selected in the
subquery. The query result is trimmed to only four rows before passing over the
network. The result is a small sample subset of the potentially much larger
inline query.

SELECT * FROM
(SELECT TITLE FROM MOVIE ORDER BY TITLE)
WHERE ROWNUM < 4;

There are two important points to remember about Top-N queries:

WHERE ROWNUM > n produces a NULL result. The
following example will return a result of "no rows selected"
(NULL).

SELECT * FROM (SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE
ORDER BY RANK DESC)
WHERE ROWNUM > 4;

CAUTION

WHERE ROWNUM > n will produce no rows. This is a likely trick
question!

Top-N queries can be confused by application of an ORDER BY
clause. The ROWNUM pseudocolumn filters out all rows but those
specified as being less than a specified value. If a sort order is applied to
the calling query containing the ROWNUM filter as opposed to the
subquery, a spurious result could occur. This is aptly demonstrated in Figure
3.4.

Figure
3.4 Top-N queries and placing an ORDER BY clause within an inline view
subquery.

CAUTION

Remember to place an ORDER BY clause in the inline view subquery
section of a Top-N query. You need to sort results before the ROWNUM
comparison.

Figure 3.4 shows a notable difference between placing an ORDER BY
clause in an inline view and placing it outside of an inline view. The reason is
that the ORDER BY clause in a query will always be executed on the
filtered result. In other words, the WHERE clause is always executed
before the ORDER BY clause. Obviously, placing an ORDER BY
clause inside the inline view resolves this issue.