Login

SQL Performance and Tuning Considerations, concluded

This article, the second of two parts, will show you how to make the SQL statements you write run faster and more efficiently. It is excerpted from chapter 11 of the book SQL DeMYSTiFied, written by Andy Oppel (McGraw-Hill/Osborne, 2005; ISBN: 0072262249).

Oracle Considerations

This section covers some tuning considerations that are speciﬁc to Oracle databases.

Oracle Index Considerations

Oracle provides some additional indexing options that are worth consideration:

Function-based indexes Normally a predicate such as WHERE UPPER(MOVIE_TITLE) LIKE ‘BIG%’ precludes the use of an index. However, Oracle allows an index to be created on a function, such as UPPER(MOVIE_TITLE), which then is usable for queries that reference the function. There are several prerequisites for the use of this option, so you may need some help from your Oracle DBA before you can use it. The syntax for a function-based index simply uses the function speciﬁcation instead of the column name in the ON clause of the CREATE INDEX statement:

CREATE INDEX IX_MOVIE_TITLE_UPPERON MOVIE (UPPER(MOVIE_TITLE));

Bit-map indexes Oracle bit-map indexes are designed to handle columns where the cardinality is low (that is, where there are relatively few data values among many rows). A bit-map index contains records that have one bit for every row in the indexed table and one index record for every possible data value in the indexed column. A bit is “on” (a binary 1) if the corresponding row in the table contains the value that the index record represents, and “off” (a binary 0) if not. For example, if the MOVIE table contained 5000 rows and you built a bit-map index on the MPAA_RATING_CODE column, the index would contain records that were 5000 bits (roughly 625 bytes) in size, and there would be 6 such records (one for each value of MPAA_RATING_CODE). The ﬁrst index record would represent the ﬁrst MPAA_RATING_CODE value (“G”) and bits in that record would be “on” when the corresponding row in the MOVIE table had a rating of “G”. The DBMS can use matrix algebra to quickly ﬁnd desired data rows, particularly when the search predicates reference multiple columns that have bit-map indexes. A bit-map index is created using the normal CREATE INDEX syntax with the keyword BITMAP used instead of the keyword UNIQUE:

CREATE BITMAP INDEX IX_MOVIE_MPAA_RATINGON MOVIE (MPAA_RATING_CODE);

Index organized tables It’s a good practice to create an index on the primary key of every table. However, for tables with only a few columns, such as reference tables and small intersection tables, this always seems wasteful because most or all of the data in the table is repeated in the index. Oracle provides a nice solution that allows the entire table to be stored in the index. Essentially, an index organized table (IOT) is a primary key index and a table rolled into a single structure. While you can create additional indexes on an IOT, in those cases you might be better off using a conventional table. Here is an example that creates a reference table for media formats as an IOT:

In Oracle, the SQL EXPLAIN PLAN statement analyzes an SQL statement and posts analysis results to a special plan table. The plan table must be created exactly as speciﬁed by Oracle, so it is best to use the script they provide for this purpose, which can be found under the Oracle Home directory as /RDBMS/ADMIN/ catplan.sql. After running the EXPLAIN PLAN statement, you must then retrieve the results from the plan table using a SELECT statement. Fortunately, Oracle’s Enterprise Manager has a GUI version available that makes query tuning a lot easier.

STATEMENT_ID is any character string that the statement author wishes to use to identify the explain results. This feature allows multiple explains to be run with the STATEMENT_ID as the identiﬁer that keeps the information about each execution plan separate in the plan table.

The statement to be explained follows the FOR keyword and can be any valid SQL statement.

When the EXPLAIN PLAN statement is run, the SQL statement is not actually executed. Instead of a result set containing rows of data, only the message “Explained.” is returned to the user. This indicates that the explain plan information has been successfully written to the plan table.

Following is the statement commonly used to retrieve and display the execution plan. It is a complex SQL statement, but the only thing that has to be changed when it is run is the STATEMENT_ID. The CONNECT BY clause is an Oracle proprietary SQL extension that joins a recursive relationship in a table through all iterations (from child to parent to grandparent, and so forth). This SQL was included here to illustrate one method of viewing explain plan results. As mentioned before, there is also a GUI tool in Oracle’s Enterprise Manager.

Here are some key points regarding the query plan that appears in the result set:

The indentation shows the order of execution, with the innermost steps being performed ﬁrst.

The cost values show a relative cost. The numbers have no meaning beyond their relative differences. For example, a cost of 10 represents a step that uses twice the resources of a step that has a cost of 5.

The cardinality values show the estimated number of rows that are processed by the step.

“TABLE ACCESS FULL” indicates a full table scan where all rows in the table are read sequentially.

“INDEX RANGE SCAN” indicates the scan of a portion of the rows in an index.

“TABLE ACCESS BY INDEX” indicates access to a table using the index shown.

The sorts and joins should be self evident by the step names.

{mospagebreak title=Microsoft SQL Server Considerations}

This section covers some tuning considerations that are speciﬁc to Microsoft SQL Server databases.

SQL Server Query Performance Considerations

Here are some considerations for running queries in Microsoft SQL Server databases:

Access via stored procedures. Placing frequently used SQL queries in stored procedures can yield signiﬁcant performance gains. In fact, in older versions of Microsoft SQL Server, it was essential to do so in order to achieve reasonable performance because there was no SQL cache to hold recently used SQL statements. In SQL Server 2005, the SQL cache allows the DBMS to recognize a statement similar to one that has already been run and to bypass statement preparation and optimization steps when the statement is reused from the cache. Nonetheless, any statement that is accessed through a stored procedure is precompiled and optimized, so it will always perform well (even if it is not found in the cache at the time it is run). This is one reason why seasoned developers strive to do most or all of their SQL Server database access using stored procedures.

Avoid nulls in unique indexes. Unlike most other SQL implementations, unique indexes in SQL Server do not handle null column values well. You can deﬁne a unique index that includes a column that is allowed to be null, but the second index entry that contains a null is considered a duplicate of the ﬁrst, and therefore results in a duplicate key error. Most other SQL implementations avoid this issue by not indexing null values.

Consider clustered indexes. A clustered index causes the table rows to be physically ordered by the indexed column(s). This can help considerably in search and join operations, but it’s at the expense of table maintenance, particularly when new rows force existing rows to be moved to maintain their sequence.

Microsoft provides a facility for explaining query execution plans just as Oracle and DB2 do. In Microsoft SQL Server 2000, the SQL Query Analyzer tool has a button labeled “Display Estimated Execution Plan” that graphically displays how the SQL statement will be executed. This feature is also accessible from the Query menu item as the option Show Execution Plan. These items may have different names in other versions of Microsoft SQL Server. In SQL Server 2005, there does not appear to be a tool called SQL Query Analyzer, and it is not clear what the tool will be called when the product is formally released.

In the Query Analyzer tool, the default mode is Object Browser. To display an execution plan, click the Display Estimated Execution Plan icon or press CTRL-L. The execution plan is displayed graphically as shown in Figure 11-1. Each step is shown as an icon and the number near the icon shows the percentage of the total query cost that applies to that step. You can place your cursor pointer over any icon to see a pop-up window showing more detailed information for that step.

Figure 11-1.SQL Query Analyzer output

DB2 UDB Considerations

Like Oracle and Microsoft SQL Server, DB2 also provides a utility to display query execution plans. IBM provides a script called EXPLAIN.DDL that creates the tables needed to hold the output of the explain facility. Once the explain tables are created, the explain facility can be used to create execution plans for queries, and the plans can then be displayed using either Visual Explain in the Control Center or by querying the explain tables using SQL.

Ensuring adequate free space for new rows Tablespaces that are short on space present problems as the DBMS searches for free space to hold rows being inserted. Moreover, inserts do not usually put rows into the table in primary key sequence because there usually isn’t free space in exactly the right places. Therefore, reorganizing the table, which is essentially a process of unloading the rows to a ﬂat ﬁle, re-creating the table, and then reloading the table can improve both insert and query performance.

Index maintenance Every time a row is inserted into a table, a corresponding entry must be inserted into every index built on the table (although null values are usually not indexed). The more indexes, the more overhead every insert will require. Index free space can usually be tuned just as table free space can.

UPDATE statements have the following considerations:

Index maintenance If columns that are indexed are updated, the corresponding index entries must also be updated. In general, updating primary key values has particularly bad performance implications, so much so that some RDBMSs prohibit them.

Row expansion When columns are updated in such a way that the row grows signiﬁcantly in size, the row may no longer ﬁt in its original location, and there may not be free space around the row for it to expand in place (other rows might be right up against the one just updated). When this occurs, the row must either be moved to another location in the data ﬁle where it will ﬁt or be split with the expanded part of the row placed in a new location, connected to the original location by a pointer. Both of these situations are not only expensive when they occur but are also detrimental to the performance of subsequent queries that touch those rows. Table reorganizations can resolve the issue, but it’s better to prevent the problem by designing the application so that rows tend not to grow in size after they are inserted.

DELETE statements are the least likely to present performance issues. However, a table that participates as a parent in a relationship that is deﬁned with the ON DELETE CASCADE option can perform poorly if there are many child rows to delete. Moreover, index maintenance also comes into play here because index entries must be removed for any deleted rows.

{mospagebreak title=Quiz}

Choose the correct responses to each of the multiple-choice questions. Note that there may be more than one correct response to each question.

Performance requirements

a. Should be set after the SQL statements are tuned

b. Provide a way to identify statements that need tuning

c. Are best when they contain complex criteria

d. Provide a way to know when to stop tweaking a query

e. Are developed just to make the auditors happy

Disk reads and writes can be minimized by

a. Allocating buffers of sufﬁcient size

b. Placing all the database ﬁles on one disk drive

c. Putting the entire database in memory

d. Adding indexes for all the important table columns

e. Spreading ﬁles across all available disk drives

The computer system can be tuned by

a. Collecting database statistics

b. Applying available security patches

c. Selecting fast and reliable hardware

d. Following the DBMS tuning recommendations

e. Consulting a tuning guide for the operating system

Efﬁcient table design includes

a. Using VARCHAR for all variable-length character columns

b. Using the smallest possible numeric data type that holds the data values

c. Using triggers whenever possible

d. Using identical data types for primary keys and their matching foreign keys

e. Using identical data types for all primary key columns

A query execution plan

a. Describes how the DBMS will execute a query

b. Is stored in the SQL cache

c. Is created using the explain plan feature of the RDBMS

d. Requires a plan table to hold the explain results

e. Requires the use of a stored procedure

The query optimizer

a. Creates a query execution plan in the plan table

b. Determines the best way to execute an SQL statement

c. May use statistics gathered from the database

d. May use rules applied to the way the statement was written

e. Can be cost-based or rule-based

In order to design proper queries, the developer should

a. Know the characteristics of the data in the database

b. Match data types in predicates

c. Use hints as much as possible

d. Avoid unnecessary tables and columns

e. Maximize the number of rows in each result set

Table scans can be avoided by

a. Including a WHERE clause that references an indexed column

b. Using GROUP BY instead of DISTINCT

c. Making sure that statistics are up to date

d. Making at least one predicate references the leading column of an index

e. Avoiding unnecessary columns

An index cannot be used when

a. The WHERE clause references the second column of an index

b. A LIKE clause references a comparison string that contains a wildcard (except in the ﬁrst position of the string)

c. The NOT operator is used in a predicate

d. An SQL function is included in a column comparison (except when they match a function-based index)