Queries

If you start up SQL*Plus, you can start browsing around immediately with
the SELECT statement. You don't even need to define a table; Oracle
provides the built-in dual table for times when you're
interested in a constant or a function:

but not everyone wants 55010 copies of the same result. The
dual table is predefined during Oracle installation and,
though it is just a plain old table, it is guaranteed to contain only
one row because no user will have sufficient privileges to insert or
delete rows from dual.

Getting beyond Hello World

To get beyond Hello World, pick a table of interest. As we saw in the
introduction,

select * from users;

would retrieve all the information from every row of the
users table. That's good for toy systems but in any
production system, you'd be better off starting with

SQL> select count(*) from users;
COUNT(*)
----------
7352

You don't really want to look at 7352 rows of data, but you would like
to see what's in the users table, start off by asking SQL*Plus to query
Oracle's data dictionary and figure out what columns are available in
the users table:

The data dictionary is simply a set of built-in tables that Oracle uses
to store information about the objects (tables, triggers, etc.) that
have been defined.
Thus SQL*Plus isn't performing any black magic when you type
describe; it is simply querying
user_tab_columns, a view of some of the tables in Oracle's
data dictionary. You could do the same explicitly, but it is a little
cumbersome.

Here we've had to make sure to put the table name ("USERS") in
all-uppercase. Oracle is case-insensitive for table and column names in
queries but the data dictionary records names in uppercase. Now that we
know the names of the columns in the table, it will be easy to explore.

Simple Queries from One Table

A simple query from one table has the following structure:

the select list (which columns in our report)

the name of the table

the where clauses (which rows we want to see)

the order by clauses (how we want the rows arranged)

Let's see some examples. First, let's see how many users from MIT are
registered on our site:

The email like '%mit.edu' says "every row where the email
column ends in 'mit.edu'". The percent sign is Oracle's wildcard
character for "zero or more characters". Underscore is the wildcard for
"exactly one character":

Now we can see that this users table was generated by grinding over
pre-ArsDigita Community Systems postings starting from 1995. In those
bad old days, users typed their email address and name with each
posting. Due to typos and people intentionally choosing to use
different addresses at various times, we can see that we'll have to
build some sort of application to help human beings merge some of the
rows in the users table (e.g., all three occurrences of "philg" are in
fact the same person (me)).

Restricting results

Suppose that you were featured on Yahoo in September 1998 and want to
see how many users signed up during that month:

OR and NOT are also available within the WHERE clause. For example, the
following query will tell us how many classified ads we have that either
have no expiration date or whose expiration date is later than the
current date/time.

select count(*)
from classified_ads
where expires >= sysdate
or expires is null;

Subqueries

You can query one table, restricting the rows returned based on
information from another table. For example, to find users who have
posted at least one classified ad:

Conceptually, for each row in the users table Oracle is
running the subquery against classified_ads to see how many
ads are associated with that particular user ID. Keep in mind that this
is only conceptually; the Oracle SQL parser may elect to
execute this query in a more efficient manner.

This may be more efficient for Oracle to execute since it hasn't been
instructed to actually count the number of classified ads for each
user, but only to check and see if any are present. Think of
EXISTS as a Boolean function that

takes a SQL query as its only parameter

returns TRUE if the query returns any rows at all, regardless of the
contents of those rows (this is why we can use the constant 1 as the
select list for the subquery)

JOIN

A professional SQL programmer would be unlikely to query for users who'd
posted classified ads in the preceding manner. The SQL programmer knows
that, inevitably, the publisher will want information from the
classified ad table along with the information from the users table.
For example, we might want to see the users and, for each user, the
sequence of ad postings:

Because of the JOIN restriction, where users.user_id =
classified_ads.user_id, we only see those users who have posted
at least one classified ad, i.e., for whom a matching row may be found
in the classified_ads table. This has the same effect as
the subquery above.

The order by users.email, posted is key to making sure that
the rows are lumped together by user and then printed in order of
ascending posting time.

OUTER JOIN

Suppose that we want an alphabetical list of all of our users, with
classified ad posting dates for those users who have posted
classifieds. We can't do a simple JOIN because that will exclude users
who haven't posted any ads. What we need is an OUTER JOIN, where Oracle
will "stick in NULLs" if it can't find a corresponding row in the
classified_ads table.

The plus sign after classified_ads.user_id is our
instruction to Oracle to "add NULL rows if you can't meet this JOIN
constraint".

Extending a simple query into a JOIN

Suppose that you have a query from one table returning almost everything
that you need, except for one column that's in another table. Here's a
way to develop the JOIN without risking breaking your application:

add the new table to your FROM clause

add a WHERE constraint to prevent Oracle from building a Cartesian
product

hunt for ambiguous column names in the SELECT list and other
portions of the query; prefix these with table names if necessary

test that you've not broken anything in your zeal to add additional
info

add a new column to the SELECT list

Here's an example from Problem Set 2 of a course that we give at MIT
(see http://www.photo.net/teaching/psets/ps2/ps2.adp).
Students build a conference room reservation system. They generally
define two tables: rooms and
reservations. The top level page is supposed to show
a user what reservations he or she is current holding:

Step 2: add a constraint to the WHERE clause

Step 3: look for ambiguously defined columns

Both reservations and rooms contain columns
called "room_id". So we need to prefix the room_id column
in the SELECT list with "reservations.". Note that we don't have to prefix
start_time and end_time because these columns
are only present in reservations.