Queries in SQL

As we've already mentioned, the heart of SQL is indeed the detailed and
satisfying queries that one can generate. Here we look at such queries
in all (well, quite a lot of) their magnificent complexity.

Simple select queries, where, and distinct

Look at the following table:

DrugDosing

Dosing

DoseMg

Frequency

D1

30

OD

D2

10

OD

D3

200

TDS

We have already shown a simple query that
'pulls out' all the data in a table:

select * from DrugDosing;

You can see that here the star (*) acts as a wild card, selecting
all of the columns in the table. It doesn't take a great leap of imagination
to work out how to select individual columns:

select DoseMg, Dosing from DrugDosing;

.. will select the two columns specified, and present the information
for all rows but just those two columns. Also see how we've specified
the columns in a different order from the one used in the original
data table definition. The information that is extracted will be presented
in the new column order:

DOSEMG

DOSING

30

D1

10

D2

200

D3

How do we select individual rows?
Twice before, we've already come across the where command - when
we talked about
deletion of rows (remember the catastrophe that results
if you leave out where?), and also in our
discussion of the update
command. Here's another use of where:

select DoseMg, Dosing from DrugDosing
where Frequency = 'OD' ;

DOSEMG

DOSING

30

D1

10

D2

See how the condition (Frequency = 'OD') didn't have to involve one of
the columns selected.

Distinct rows

What happens if we say ..

select Frequency from DrugDosing;

.. do we get two rows or three? Let's see:

FREQUENCY

OD

OD

TDS

.. three rows as you expected. But let's say we only want all the
distinct rows, with no duplication. Then we say:

SQL allows you to perform basic arithmetic on numeric rows.
You can add, subtract, multiply and divide either by the value in
another column, or by a constant. Let's say that we wanted to list
all the drug doses halved. We could say:

select DoseMg / 2 from DrugDosing;

.. and we would be given a table with dose values divided by two.
With more complex expressions, our "usual" precedence rules (BODMAS)
apply - when there is any possibility of doubt, use parentheses. There
is also a unary minus that gives the negative of a value
(e.g. select - DoseMg .. would give the values -30, -10 and -200).

Conditions and Logic

We've already encountered the where statment (in the previous section).
Let's look at it in a bit more detail.

First note that when we said

.. where Frequency = 'OD' ;

we used the "=" operator to test for equality (We've
previously mentioned
why some seasoned (heh) programmers might find this strange).
There is a number of other operators you will come across. Here they are:

Operator

Meaning

<>

inequality - the two items are not equal (avoid
operators like != and ^= which only work on some systems)

<

Less than

>

Greater than

<= >=

Less than or equal to, and greater than or equal to, respectively

betweenfirstvalueandsecondvalue

Is the value between the two stated limits? Note that
the limits are NOT included, i.e. 4 is not between four and ten.

The short answer is 'No' {check on how the different SQLs handle things}
Most SQL applications are, at least in part, case sensitive. It's always
best to assume case sensitivity, and compensate for this. SQL has at
least two functions available that allow case conversions:

upper (string) converts string to upper case;

lower (string) is similar, converting to lower case.

{check that these aren't Oracle pty ????????????}

To ensure that we picked up lower and upper case It's, we'd have
to say:

Most special characters can be put in between quotes, and are handled
just fine. The only two you really need to worry about are percent
and underscore. SQL has a clumsy convention for representing these.
The convention is as follows:

Choose some infrequently used character as an 'escape character';

Insert the escape character in the string just before the
special character (_ or %) you want to represent literally,
rather than using it as a wildcard;

Tell SQL that you have just used an escape character, using
the escape command.

An example will clear things up.
The phrase:

.. where Whatever like '%50^%' escape '^'

should match values of Whatever such as 'A 50% increase',
and '50% of them came' but not 'I turn 50 today'. The explanation is
that we chose '^' as the escape character, so like '%50^%'
translates as "give me anything at all, followed by the characters five
and then zero and then a percent sign" and not "give me anything
followed by five and then zero and then anything again".

SQL supports the usual Boolean logic that we associate with computers
(although it doesn't have an "exclusive or" function). You can therefore
combine multiple conditions in one where statement. In short,

A and B

will succeed (evaluate to true) if and only if both A and B are true;

A or B

will succeed if either A or B is true, or indeed if both are true;

not B

will evaluate to true only if B is not true. Not reverses the sense
(truth) of an assertion!

And that's really that. The only other thing you need to know is that
you can use parenthesis to group together logical statements into "compound
statements" such as:

A and ( C or D )

where (C or D) will be evaluated first, and then the result
will be anded together with A. A rather silly example is:

A note on having

There's another way of selecting things that is superficially similar
to where. It's called having and differs in that
it's only applied after the select has been performed.
We will only consider havinglater, because it's mainly used
with group functions, which we haven't discussed yet!

Selecting between multiple options - the SQL92 case statement

This has nothing whatsoever to do with case sensitivity! The SQL92 case
statement is vaguely similar to the switch statements of languages such
as C++ and Java - it allows you to make one of many choices, based on
a single datum value. (To confuse you further, single options within a switch
statement in these languages start with the keyword case. To really
confuse you, Visual Basic has a select .. case structure
that broadly corresponds to the switch statement)!

{Need to explore this in detail over here}

Ordering and grouping data

Sorting results

One can also sort the output from a select statement. The default
is to sort in ascending order - small numbers come before larger ones.
What happens with text? Well, as you might expect, the sort is alphabetical
(A..Z) but take note that all characters (including special ones like
percentage, full stops [periods], and blanks) are actually sorted according
to their ASCII values. This may cause confusion, for example a blank
(ASCII code 32 decimal) comes before 'A' (ASCII code 65 decimal).
{check up on details of SQL sorting algos}.

Practically, we say:

select DoseMg, Dosing from DrugDosing order by DoseMg;

and we would hope to obtain..

DOSEMG

DOSING

10

D2

30

D1

200

D3

One can also sort in descending order, using the desc
operator:

select DoseMg, Dosing from DrugDosing order by DoseMg desc;

(The default is asc for an ascending sort). There is something
to be strongly disparaged over here. You can actually specify the column
number (eg 1) instead of the column name, but there are two good reasons
why you shouldn't:

With most computer languages, we count starting from zero. In SQL,
the counting is from one - column number 1 is the leftmost column!!
A sad relic.

You can even sort on several columns (first sort by the first column
mentioned, then sort by the next one, and so on). All you need do is
list the columns after the order by statement..

select DoseMg, Dosing from DrugDosing order by Frequency, DoseMg ;

A further note..

Remember how we could say select .. where and mention columns in
the where statement even though they didn't appear in the final
output? The same applies to sort! You can sort on a column even
if the result doesn't contain that column.

We've already encountered the distinct command, which allows
us to look at all the rows we've generated, and select out those which
are distinct. There are other ways of grouping rows together. Before
we learn these, let's find out about a few functions that deal with
groups. Here they are:

Count the number of rows; note that you can pass any
column argument to count, and it still returns the number of rows;
here is one case where count(1) is of value, rather than annoying
our friend Dr Codd! Much more sexy is the ability to count distinct
rows, for example one might say count (distinct whatever ),
and the number of distinct items in the whatever column will be
returned to you!

sum

Add up the values in a column

(many other functions)

Beware of proprietary functions that lock you into
a particular vendor's product!

See how using group byFrequency allows us to break up
the table into sub-groups, and then (using our new-found knowledge of
group functions) extract summary information about each group!

Take note that if you have an 'single' column (one that doesn't
describe a group statistic) in the list you provide to
select (Frequency, in the example above) and you don't include this
column name in the group by section, then you'll get an error!
Our example above is trivial, but the group by function is
extremely powerful. You can even specify several columns, to aggregate
subsets of subsets..

Similar to where, having is only applied after the select
statement has been processed, and is used to ELIMINATE rows from the selection.
Generally you should use where, because it's more efficient. When,
you ask yourself, should you use having?

Think about the case where we need to make a decision based on an
aggregate function (performed on a group). You can only make the decision
to select certain records based on the result of the aggregate function
once it's been done, so where is useless for this purpose. Enter
(taraa!) having. Here's an example:

Although again trivial, this example illustrates how you can use
having to eliminate subgroups - if there is only one item in
a subgroup, it is now eliminated (presumably because the average, max, and
min of a single item is of little interest to us)! Note how we use
count(1) - this is convenient and quick, as discussed above.

Remember the tables we discussed when we were talking
about foreign keys, and even earlier
when we talked about common sense in normalising data,
(not that we are displaying conspicuous amounts of this with the following
tables which are, after all, only for demonstration purposes)! How
does one amalgamate the data in the tables? (In dataspeak, we call the
relationship between the tables master-detail, or sometimes,
parent-child). Well, let's look at the tables..

DrugRegimen

Regimen

Drug

Dosing

R1

Carbimazole

D1

R2

Carbimazole

D2

R3

Carbamazepine

D3

DrugDosing

Dosing

DoseMg

Frequency

D1

30

OD

D2

10

OD

D3

200

TDS

You might think that a natural extension of the good old select
statement is the following:

select * from DrugRegimen, DrugDosing;

and you would be perfectly correct, but what does the above statement
give us when we actually use it? Here we go..

REGIMEN

DRUG

DOSING

DOSING

DOSEMG

FREQUENCY

R1

CARBIMAZOLE

D1

D1

30

OD

R2

CARBIMAZOLE

D2

D1

30

OD

R3

CARBAMAZEPINE

D3

D1

30

OD

R1

CARBIMAZOLE

D1

D2

10

OD

R2

CARBIMAZOLE

D2

D2

10

OD

R3

CARBAMAZEPINE

D3

D2

10

OD

R1

CARBIMAZOLE

D1

D3

200

TDS

R2

CARBIMAZOLE

D2

D3

200

TDS

R3

CARBAMAZEPINE

D3

D3

200

TDS

Wooops! Every single row of the first table has been joined with
each and every row of the second table, not just the rows that
we think should correspond! (This is called a Cartesian join or
cross join, and can rapidly generate enormous tables - as
Ladányi points out, if you perform a cross join on three tables, each with a thousand
rows, then - voila - you have 1000 * 1000 * 1000 = one billion
rows, enough to bring most databases to their knees).

For our purposes, most of the rows in the above cross join are meaningless,
but we can easily reduce the rows to only those we are interested in.
We simply use a where statement to join the tables
on the Dosing column, thus:

It's so important to always have a where condition with your
Cartesian joins, let's make it into a rule:

If the from clause in a select statement has
a comma in it, check the where clause.
Then check the where clause again. And again! ... Rule #4.

Also note that the two tables each had a column with the
same name - "Dosing". We easily sidestepped this one by simply talking
about DrugRegimen.Dosing = DrugDosing.Dosing, rather than, say,
Dosing = Dosing, which would have forced an error!
Needless to say, you can select individual columns from the cross join,
rather than having to say select *.

Inner versus Outer Join

The above is an example of an inner join. What this means is
that if, for every value in the DrugRegimen.Dosing column, there's
a corresponding value in the DrugDosing.Dosing column,
and vice versa, then everything's fine.
However, if (due to some silly person not enforcing relational integrity)
there is no matching value in the corresponding column, the whole row with
its unmatched value will disappear from the final report - it
will softly and suddenly vanish away! Apart from being a goad to ensure
relational integrity in all of your databases, this should alert you
to the possibility that you might trustingly run a query on a database,
and get complete garbage out, because you used an inner join! The
solution is an outer join.

Needless to say, few vendors have stuck to the SQL-92 standard
as regards outer joins. For example, Oracle sneaks three tiny characters
into the where statement thus:

The (+) tells SQL to "join in a NULL row if you can't find
anything that matches a problem row in DrugDosing.Dosing " - all very convenient, but not standard SQL-92.
Also note that the (+) is on the same side of the equals sign as the
table that is 'augmented', not the one that's causing the problem.
It should be clear why this is called a left outer join, and

{Here explore the SQL-92 standard, and other ways it's not been implemented}

Many vendor SQLs also do not implement the SQL standard for a
full outer join, that lists all rows (whether matched or not) from
all tables. The SQL-92 syntax for the from clause is:

fromtable1FULL OUTER JOINtable2

There are
other ways of achieving a full outer join in most SQLs.
Remember that the way to avoid all this is to meticulously enforce
constraints on integrity! Also note that an outer join will NOT help you if
there are duplicate entries in one of the tables you are using
for the join (which can only occur in a 'relationally challenged' database).

How to combine queries using set operators

Early on in school you were probably encouraged to draw "Venn diagrams"
showing intersecting sets, and then colour in the union or intersection
of sets, or even the complement of a set (everything outside the set).

{Could here have pix}

The set operators in SQL are based on the same principles, except they
don't have a complement, and can determine the 'difference' between
two sets. Here are the operators which we apply to
combine two queries:

union - all elements in both queries are returned;

intersect - elements common to both queries are returned;

except - elements in the first query are returned excluding
any that were returned by the second query.

These are powerful ways of manipulating information, but take note:
you can only apply them if the results of the two queries (that are going
to be combined) have the same format - that is, the same number of columns,
and identical column types! (Although many SQLs try to be helpful by,
for example, coercing one data type into another, an idea which is
superficially helpful and fraught with potential for errors).
The general format of such queries is illustrated by:

selectcolumns1fromtable1unionselectcolumns2fromtable2

Different strokes..

Different vendor implementations of SQL have abused the SQL-92
standard in different ways. For example, Oracle uses minus where
SQL-92 uses except. {give other examples}.

An outer join could
be used (with modification for NULLs if these little monstrosities
are present) to achieve the same result as except.

Similarly, an inner join (with select distinct) can do what intersect
does.

Set operators can be combined (as you would expect when playing around
with sets) to achieve results that simply cannot be obtained using a
single set operator.

Note that there are some restrictions on using order by with
set operators - order by may only be used once, no matter how
big the compound statement, and the select list must contain the
columns being used for the sort.

Not content with implementing set operators, SQL database creators have
also introduced what are called "pseudoset operators". These operators
don't fit conveniently into set theory, because they allow multiple rows
(redundancies) which are forbidden in true sets.

We use the pseudoset operator union all to combine the
outputs of two queries (all that is done is that the results of the
second query are appended to the results of the first). Union all
does exactly what we required from a FULL OUTER JOIN, which as we've
already mentioned, is not implemented in many nominally "SQL-92
compliant" databases!

Subqueries

Wouldn't it be nice if you could perform one query, temporarily store
the result(s), and then use this result as part of another query? You
can, and the trickery used is called a subquery. The basic
idea is that instead of a 'static' condition, you can insert a
query as part of a where clause! An example is:

select * fromtablenamewherevalue >
( insert select statement here);

Note that in the above query, the inner select statement must return
just one value (for example, an average). There are other restrictions -
the subquery must be in parenthesis, and it must be on the right
side of the conditional operator (here, a greater than sign). You
can use such subqueries with =, >, <, >=, <= and <>, but
not {to the best of my knowledge?} with between .. and.

Multiple select subqueries can be combined (using
logical operators)
in the same statement, but avoid complex queries if you possibly can!

Subqueries that return multiple values

In the above, we made sure that our subquery only returned a single value.
Can you think of a way you might use a subquery that returns a list of
values? (Such a thing is possible)! Yes, you need an operator that
works on lists. An example of such an operator is in, which
we've encountered before. The query should look
something like:

select * fromtablenamewherevaluein ( insert select statement here);

The assumption is that the nested select statement returns
a list. The outer shell of the statement can then use in to get
cracking on the list, looking for a match of value within the
list! There is a surprisingly long list of operators that resemble in,
and can be used in a similar fashion. Here it is:

Operator

What it does

not in

There is no match with any value retrieved by
the nested select statement.

in

We know how this works. Note that = any is
a synonym for in that you'll sometimes encounter!

> any

The value is greater than any value in the
list produced by the inner submit statement. This is
a clumsy way of saying "Give me the value if it's bigger than
the smallest number retrieved"!

>= any
< any
<= any

Similar to >. Usage should be obvious.

> all

Compare this with > any - it should be
clear that the condition will only succeed if the value is
bigger than the largest value in the list returned by the
inner select statement!

>= all
< all
<= all

If you understand > all, these should present
no problem!

= all

You're not likely to use this one much. It implies that
(to succeed) all the values returned by the inner subquery are
equal to one another and the value being tested!

(It is even possible in some SQL implementations to retrieve a 'table'
(multiple columns) using the inner select statement, and then
use in to simultaneously compare multiple values with the
rows of the table produced by this inner select. You'll probably
never need to use something like this. Several other tricks are
possible, including the creation of virtual views by using a subquery
[See Ladányi p 409 if you're interested]. Views are discussed
in the next section.)

Correlated Subqueries

Whew, we're nearly finished with the subqueries, but there is one
more distinct flavour! The correlated subquery is a nested
select statement that can (using trickery) refer to the
outer select statement containing it. By so doing, we can successively
apply the inner select statement to each line generated by the
outer statement! The trick that we use is to create an alias
for the outer select statement, and then refer to this alias in the inner
select statement, thus constraining the inner select to dealing with
the relevant row. For an example, we return to our tedious drug table:

DrugDosing

Dosing

DoseMg

Frequency

D1

30

OD

D2

10

OD

D3

200

TDS

Let's say we wanted (for some obscure reason) all doses that are
greater than average dose, for each dosing frequency. [Meaningless, but
it serves the purposes of illustration].

The sense of this statement should be clear - we use the outer
select to choose a row (into which we put DoseMg, and Frequency).
We then check whether this row is a candidate (or not) using the
where statement. What does the where statement check?
Well, it makes sure that DoseMg is greater than a magic number. The
magic number is the average dose for a particular Frequency, the
frequency associated with the current row. The only real trickery
is how we use the label fred to refer to the current line
from within the inner select statement. This label fred is
called an alias. We'll learn a lot more about aliases later (Some
will argue that aliases are so important you should have encountered them
long before, but we disagree).

Correlated subqueries are not the only way of doing the above.
Using a temporary view is often more efficient, but it's worthwhile
knowing both techniques. We discuss views next.

As the name suggests, a view gives a particular user access to selected
portions of a table. A view is however more than this - it can limit
the ability of a user to update parts of a table, and can even amalgamate
rows, or throw in additional columns derived from other columns. Even
more complex applications of views allow several tables to be combined
into a single view!

How do we make a view?

Interestingly enough, you use a select statement to specify
the view you wish to create. The syntax is:

create viewnameofviewasselecthere have details of select statement

A variant that you will probably use rather often is:

create or replace viewnameofviewasselecthere have details of select statement

(Otherwise you have to explicitly destroy a view - SQL won't simply
overwrite a view without the or replace instruction, but will
instead give you an irritating error).

Remember that if you alter the view, you alter the underlying
table at the same time!

You cannot use an order by statement (or something else
called a for update clause) within a view.
There is a whole lot of other convenient things you can do to views.
Where you include summary statistics (eg count, sum, etc) in a view it
is termed an aggregate view. Likewise, using distinct,
you can have a view on the possible values of a column or grouping of columns.

You can even create a view that is derived from several tables (A
multi-table view). This is extremely sneaky, as you can largely
avoid complex join statements in code which pulls data out of several
tables! Ladányi puts things rather well:

"Pre-joined and tested views reduce errors .. that subtly or
obviously undermine the accuracy of reports, and thus the credibility
and subsequent professional well-being of the people creating them".

How do we limit access to a view?

This is implicit in the way we sneakily use a select statement -
to limit access to certain columns, for example,
we just select the column-names we want access to, and ignore the rest!
There is of course a catch (Isn't there always?) - if you insert
a row, then SQL doesn't know what to put into the column entry that's
not represented in the view, so it will insert either NULL, or the default
value for that column. (Likewise, with delete, the entire row
will be deleted, even the column entry that is invisible).

It is also obvious how we limit access to certain rows -
we use a where clause that only includes the rows we want in
the view. Note that (depending on your selection criterion) it is
possible to insert a row into a view (and thus the underlying database)
and then not be able to see this row in the view!
With (in)appropriate selection criteria for the view, one can also alter
the properties of rows visible in the view so that they now become hidden!

More draconic constraints are possible. The option

with read only

.. prevents any modifications to the view (or
the underlying database); while with check option prevents
you from creating rows in the view that cannot be selected (seen) in
the view itself. If you use the "with check option", then you should
follow this with a name, otherwise SQL will create an arbitrary and
quite meaningless name for the constraint that will only confuse you
when an error occurs!

How do we remove a view?

{check this out. presumably same as dropping a table}

More things we can do with views

One trick is to create a "virtual view" and then use this as a "table"
which you can update. Instead of specifying a table name, you specify
(in parenthesis) the select statement that defines the "virtual view",
and all actions are performed on this temporary table! This is particularly
useful where you don't have the system authority to create a view, yet
need the power of a view.

{Does SQL92/98 support the "comment on" facility for views??}

An under-utilised but rather attractive use of views is to make
them based on set (or pseudo-set) operators, for example the union
of two sets.

{ Note: have something on exists and not exists;
also need to talk about recursive joins and tree queries (a la SQL92 and
also eg Oracle) }