SQL Server 2005 : Data Querying and Reporting (part 2)

Querying a Sampling of the Data Stored

In
some cases, you don’t want to run a query against all the data in a
table. In such a situation, you can use the new TABLESAMPLE clause to limit the number of rows that any query processes. Unlike TOP, which returns only the first rows from a result set, TABLESAMPLE returns rows selected randomly by the system from throughout the set of rows processed by the query.

Note

TABLESAMPLE
requires a sufficient amount of data to operate properly. In small
tables, where all the data fits onto a single data page, the only
possible returns are all (100%) and none (0%).

You can specify the conditions of the random selection by percentage or by number of rows, as in the two following examples:

Selecting Rows Based on NULL Values

A NULL value is a value given to a field that that has no value. Many people confuse NULL values with zero-length strings or the value zero, but they are not the same. NULL
is basically a fancy word for a value that is unknown. In SQL Server,
you can select the desired NULL values or reject them by using ISNULL,
as shown in the following query:

The ISNULL function operates on the basis of substitution. In the previous example, if a value for StreetAddress is not known for a particular customer, then within the query, it is replaced by and treated as the ‘#’ character, which matches the condition of the WHERE clause and returns customers with NULL addresses. NULLJOIN operations that formulate derived tables. values frequently show up as the result of

Relating Data from Multiple Tables

Joins
and derived tables figure prominently in the 70-431 exam. Joins are the
backbone of relational databases; they actually put the “relation” in
relational databases. They can be used in all the main SQL statements (SELECT, INSERT, UPDATE, and DELETE).
They are important. Also, derived tables tend to be overlooked, and
they’re perceived as complicated even though they’re not; so they are
also likely to show up on the 70-431 exam.

Joining
tables is a natural occurrence in a relational database system. Many of
the queries performed on a regular basis involve multiple tables.
Whenever you query data from two tables, you need to find some way to
relate the two tables. Connecting one table to another requires a
common element of the tables. You would use a JOIN operation in this manner whenever you want to see a result set that includes columns from several tables.

You can use three basic join types, as well as a union operation, to connect tables:

Inner join—
An inner join shows results only where there are matches between the
elements. An inner join leaves out all the records that don’t have
matches.

Outer join—An outer join can show all the records from one side of a relationship, records that match where they are available, and NULL
values for records that do not have matches. An outer join shows all
the same records as an inner join, plus all the records that don’t
match.

Cross join—
The cross join is less often used than the other two types of joins
because it returns all possible combinations of rows between the two
sides of the join. The number of records in the result set is equal to
the number of records on one side of the join multiplied by the number
of records on the other side of the join. No correlation is attempted
between the two records; all the records from both sides are returned.

Again,
cross joins are less frequently used than inner and outer joins. With
an outer join, you are guaranteed to have all records returned from one
side or the other. With inner joins, the only rows returned are those
that match in the joined tables. It is easier to contemplate the
overall processes if you consider join operations first. What you put
in the WHERE clause and other clauses
is applied after the joins are processed. So bear in mind that when a
join returns a specified set of records, the SQL statement may or may
not return all those records, depending on what you have specified in
the WHERE clause.

Now let’s look at each of the different join operators: INNER JOIN and OUTER JOIN.

Outputting Only Matches: INNER JOIN

The INNER JOIN
statement is the easiest and most often used join operator. For this
reason, when an inner join operation is performed, the INNER
portion of the syntax is optional. A rowset is returned from the
operation for the records that match up based on the criteria provided
through the ON clause. A one-to-many
relationship is handled inside an application with the inner join. To
show all orders for a particular customer, you could use the following
join operation:

The
results from this query may appear a little on the unusual side because
no sorting is performed. Typically, an order is specified or
information is grouped to make the output more usable.

Returning Output Even When No Match Exists: OUTER JOIN

You
can use an outer join when you want to return all of one entire list of
rows from one side of the join. There are three types of outer joins:
left, right, and full. The terms left and right
are used based on the positioning of the tables within the query. The
first table is the right; the second is the left. You may find it
easiest to draw a picture to represent the table when you are first
learning outer joins. A right outer join, often abbreviated as right join,
returns all the rows belonging to the table on the right side and only
the matching rows on the table on the left side. Conversely, a left
outer join returns all the rows from the table on the left side. A full
outer join returns all the rows from both sides that have correlations.

Left
and right outer joins, for all intents and purposes, are the same
operations; they are simply a matter of the position of the tables
within the queries. Therefore, the following examples use only the left
outer join syntax, which is the one typically used. In the current ANSI
standard, RIGHT is the table name on the right side of the JOIN keyword, and LEFT is the table being joined. The following query produces a listing of all customers and their orders:

SELECT * FROM Customers AS C LEFT JOIN Orders AS O ON C.CustomerID = O.CustomerID

Customers that have never placed an order would still be in the listing, accompanied by NULL for the OrderID.

A
cross join, also known as a Cartesian join, is rarely used because it
connects every element in one table with every other element of another
table. This has some bearing in statistical operations but is not
relevant in most business processing.

Applying Conditional Data Filtering

You
apply filtering to data to determine the data to be selected based on
conditional requirements. Essentially, all conditions come down to one
of three possible outcomes. If two values are compared, the result is
positive, negative, or equal (greater than, less than, or equal to).
Actually, filters always evaluate to a Boolean result, either True or False. BETWEEN 10 and 20 is either True or False. Even numeric tests, such as month > 0 and Price > 10.00, are either True or False.

BETWEEN, IN, and LIKE

With the help of the BETWEEN keyword, you can specify ranges when using the WHERE clause. Simply put, BETWEEN provides a range of values within which the data should lie; otherwise, the data does not meet the condition. BETWEEN
is inclusive, meaning that the range includes the lower value specified
and the upper value specified. For example, the following query would
have the values 10 and 20 as a possibility in the results:

SELECT * FROM Products WHERE UnitPrice BETWEEN 10 AND 20

If the intent is to exclude the value 20, the query would be written like this:

SELECT * FROM Products WHERE UnitPrice BETWEEN 10.00 AND 19.99

You can also incorporate something known as a list when using the WHERE
clause. Essentially, a list specifies the exact values a column may or
may not take. If the record does not contain the value for the column
specified in the IN list, it is not selected. IN determines whether a given value matches a set of values listed. Here is an example:

SELECT * FROM Customers WHERE Country IN ('UK', 'USA')

This example limits the values of Country to only UK and USA. Customers who live in the countries mentioned in the IN list are the only ones listed.

You can retrieve rows that are based on portions of character strings by using the LIKE predicate. The LIKE predicate determines whether a given character string matches a specified pattern. The data types a LIKE statement can work with are char, varchar, nvarchar, nchar, datetime, smalldatetime, and text. A pattern specified in the LIKE
predicate can include regular characters and wildcard characters.
During pattern matching, regular characters must exactly match the
characters specified in the column value. Wildcard characters can be
matched with any character or set of characters, according to the
wildcard character used, as shown in Table 1.

Table 1. Wildcard Characters Allowed in T-SQL

Character

Meaning

[]

Any single character within the specified range (for example, [f-j]) or set (for example, [fghij])

_ (underscore)

Any single character

%

Any number of zero or more characters

[ ^ ]

Any single character not in the specified range or set

If an application repeatedly calls the LIKE
predicate and performs numerous wildcard searches, you should consider
using the MS Search facility if it is installed and in use on the
server. Consider the value of the response time over the storage
resources that the MS Search service and full-text search capabilities
require. MS Search service is required to use a full-text search.
Full-text searching enables a variety of powerful wildcard searches.
You should avoid LIKE searches that have a % wildcard at both the beginning and the end. The following example shows how the LIKE clause uses the % wildcard to select all customers whose CustomerIDA: begins with the letter

SELECT CustomerID, ContactName FROM Customers WHERE CustomerID LIKE 'A%'

You can also use the NOT keyword with the LIKE predicate to simply retrieve a query that does not contain records matching the specified elements in the LIKENOT. clause. With character matching, it is sometimes more efficient to exclude characters by using