Introducing Indexed Views

PLEASE NOTE: The authors based their SQL Server 2000 articles for this issue on pre-Beta 2 versions, so you might notice some differences between Beta 2 and the behavior or interfaces we describe in this issue. In particular, please note that the indexed views feature will be available only in SQL Server 2000 Enterprise Edition. However, you can install Enterprise Edition on NT 4 Server and Windows 2000 (Win2K) Server. You don't need NT 4.0 Enterprise or Win2K Advanced Server.

Server 2000's powerful new support for indexed views promises to make your life easier while making your applications and queries run faster. Sound too good to be true? Indexed views let you precompute all sorts of joins, aggregations, and computations so that you don't have to write such conditions into every query. And Microsoft is reporting gains of 10 to 100 times in the performance of applications and queries that access indexed views instead of base tables. Although Oracle supports a similar feature called materialized views, SQL Server's new indexed views go far beyond what the competition offers.

Views from the Top

SQL Server has supported views since its earliest incarnation. A view is essentially a stored SELECT statement that acts as a filter. For example, suppose you have a large table containing all of your company's customer records, but sometimes you're interested in only the names and phone numbers of customers who live in your ZIP code. You could create a view similar to the following:

You can now access this view as if it were a table but without requiring the separate storage space associated with a table. Whenever a query references the view, SQL Server merges the code that defines the view with any other conditions you've specified. SQL Server then optimizes, compiles, and executes the resulting query. Using some of SQL Server 7.0's special caching mechanisms, the database might be able to reuse the plan, but reuse isn't as likely as with SQL Server stored procedures. (For details about plan caching and reuse, see Inside SQL Server, "SQL Server 7.0 Plan Caching," September 1999.)

Views have two main benefits. First, they simplify query writing because they don't require you to directly access tables and because they often already include restrictions, computations, aggregations, and joins—saving you from retyping those conditions. Second, views provide a security mechanism, letting you give users access to an appropriate view of a table's data instead of access to the entire table.

A view contains no saved data; it's only a saved query. Therefore, a view's contents are dynamic. If you add rows to a table or change data values, the view will show the new information. The downside of this behavior is that if the view contains computations based on columns in the table, SQL Server must perform those computations whenever someone accesses the view. In addition, to help process the query, the database can't use an index on any column involved in a computation.

Consider a typical view that shows the day of the week on which each order was taken. The base table has an Orderdate column, but let's say you want to retrieve order information, too. You could create a view like this:

You can then use the view to retrieve all orders placed in December (to analyze how much your business increased during the holidays), but no index would help find these rows. An index on the Orderdate column would locate rows by the exact date, but not by the month, in which the order was placed.

With SQL Server 2000's indexed views, you can now build a clustered index on a view. A clustered index is the only type of SQL Server index that contains data; the clustered index on a view contains all the data that makes up the view definition. As soon as you create this clustered index, the view is materialized, meaning SQL Server allocates storage space for it. You can then treat the view like a table, building multiple nonclustered indexes on it. (I based the following explanations and examples on SQL Server 2000 Early Adopters Program 4—EAP4—build 047, a build between beta 1 and beta 2.)

Creating Your View

Before creating an indexed view, you need to guarantee the view will always return the same results for the same underlying data. To do so, make sure the following SET conditions are ON:

By default, SQL Server 2000's client tools set all these options except ARITHABORT to the correct values, making it easier for you to work with indexed views. To be safe, you can use sp_configure to set a bit in the user options value to set ARITHABORT ON. The following code finds the current value of user options by looking in the syscurconfigs table, then uses the bitwise OR operator to turn on the appropriate bit in addition to any other bits you've previously set:

Note that the options must be set to these values when you create the indexed view, when you modify any tables included in the indexed view, and when the optimizer decides to use the indexed view as part of a query plan. You can check which options are set for a particular connection by executing the query DBCC USEROPTIONS. You can also use the new property function SESSIONPROPERTY to test whether you have each option appropriately set (1 equals ON, 0 equals OFF):

SELECT SessionProperty('NUMERIC_ROUNDABORT')

You also need to keep a few other special requirements in mind when creating a view. First, all functions and expressions in the view definition must be deterministic. In general, any function that can return for the same arguments two different results in two separate invocations is nondeterministic. Two obvious examples of nondeterministic functions are getdate( ) and rand( ). Most parameterless system functions, such as @@spid, @@servername, and @@rowcount, are also nondeterministic. You might be surprised that datename( ) is nondeterministic; the values it returns depend on the language you've configured SQL Server to use with sp_configure. The language can change from user to user—even on the same server in the same database—so that one user can get error messages in English, for example, and another can get messages in Spanish. Datepart( ) is nondeterministic when the first argument is DW (day of week) because DW can vary based on the DATEFIRST setting. (The final version of SQL Server Books Online—BOL—for SQL Server 2000 will list which functions are and aren't deterministic.) In addition, any view that contains a column, constant, or expression of data type float in its definition is nondeterministic.

The second requirement to be aware of when creating a view is that you don't want any underlying object's schema definition to change. To prevent a change in schema definition, SQL Server 2000's CREATE VIEW statement allows the SCHEMABINDING option. When you specify with SCHEMABINDING, the SELECT statement defining the view must include the two-part names (owner.object) of all referenced tables. You can't drop or alter tables participating in a view created with the SCHEMABINDING clause unless you've dropped that view or changed the view so that it no longer has SCHEMABINDING. Otherwise, SQL Server raises an error. If someone else owns a table on which you're basing a view, you don't automatically have the right to create the view with SCHEMABINDING, which would restrict the table's owner from making changes to her own table. The table owner must grant the other user REFERENCES permission for that user to create a view with SCHEMABINDING.

Finally, SQL Server 2000 restricts the syntax you can use on the CREATE VIEW statement. The view definition can't contain the following:

TOP

text, ntext, or image columns

DISTINCT

MIN, MAX, COUNT(*), COUNT(), STDEV, VARIANCE, or AVG

SUM on a nullable expression

a derived table

the ROWSET function

another view (you can reference only base tables)

UNION

subqueries, OUTER joins, or self-joins

full-text predicates (CONTAIN or FREETEXT)

COMPUTE or COMPUTE BY

Also, if the view definition contains GROUP BY, you must include the new aggregate COUNT_BIG(*) in the SELECT list. COUNT_BIG returns a value of the new data type BIGINT, which is an 8-byte integer. A view that contains GROUP BY can't contain HAVING, CUBE, ROLLUP, or GROUP BY ALL. And all GROUP BY columns must appear in the SELECT list.

To verify that you've met all these requirements, use the ObjectProperty function's new 'IsIndexable' value. The following query tells you whether you can build an index on a view:

SELECT ObjectProperty(object_id('Product_Totals'), 'IsIndexable')

A return value of 1 means you've met all requirements and can build an index on the view.

Although these restrictions might seem severe, the benefits of indexed views outweigh the prerequisites you have to meet. Two other new SQL Server 2000 features—user-defined functions (UDF) and indexes on computed columns—also have many of the same requirements. Also, remember that these restrictions apply to the view definitions, not to the queries that might use the indexed views.

Creating an Indexed View

Now that you've seen what you can and can't have inside your view definition, you're ready to create an indexed view. The first step is to define the view, as in Listing 1. Note the with SCHEMABINDING clause and specification of the owner name (dbo) for the table. At this point, you have a typical view—a stored SELECT statement that uses no storage space. In fact, if you run the system stored procedure sp_spaceused on this view, you'll get the error message

Server: Msg 15235, Level 16, State 1,
Procedure sp_spaceused, Line 91Views do not have space allocated.

To make an indexed view, you need to create a unique clustered index on the view. This statement defines a unique clustered index, Product_Totals, for the view:

CREATE UNIQUE CLUSTERED INDEX PV_IDX on Product_Totals(productid)

After you create the index, you can rerun sp_spaceused. You'll receive results similar to those in Figure 1.

Data that comprises the indexed view is persistent, with the indexed view storing the data in the clustered index's leaf level. You could construct something similar by using temporary tables to store the data you're interested in. But a temporary table is static and doesn't reflect changes to underlying data. In contrast, SQL Server automatically maintains indexed views, updating information stored in the clustered index whenever anyone changes data that affects the view.

After you create the unique clustered index, you can create multiple nonclustered indexes on the view. By using the ObjectProperty function's 'IsIndexed' argument, you can determine whether a view is indexed. For the Total_Products indexed view, the following statement returns a 1 (is indexed):

SELECT ObjectProperty(object_id ('Product_Totals'), 'IsIndexed')

Note that at beta 1, executing system stored procedure sp_help on the view doesn't report the view as indexed and doesn't show any of the indexes. However, running system stored procedure sp_helpindex on the view returns complete information about any indexes on the view.

Using Indexed Views

One of indexed views' most valuable benefits is that your queries don't have to directly reference a view to use the index on the view. Consider the Product_Totals indexed view. Suppose you issue the following SELECT statement:

SQL Server's query optimizer realizes the precomputed sums of all the Quantity values for each productid are already available in the index for the Product_ Totals view. The optimizer will evaluate the cost of using that indexed view in processing the query. But just because you have an indexed view doesn't mean the query optimizer will always choose it for the query's execution plan. In fact, even if you reference the indexed view directly in the FROM clause, the optimizer might decide to directly access the base table instead.

To determine whether the optimizer is using the indexed view, you can look at the query plan in the Query Analyzer. The graphical Estimated Execution Plan in Screen 1 shows that the query is using the clustered index on the view.

You can compare the cost of using an indexed view's precomputed values with the cost of directly accessing the base order details table. A new SQL Server 2000 query hint, OPTION (EXPAND VIEWS), forces SQL Server to expand all indexed views into their underlying SELECT statements so that the optimizer won't consider indexes on the views. To compare the cost of accessing the same data with and without the indexed view, execute SET STATISTICS IO ON, run the code in Listing 2 (which executes one SELECT that uses the indexed view and one that uses EXPAND VIEWS), then compare the values returned for logical reads. When I ran this code, I received a value of 2 logical reads when the system used the indexed view and 19 logical reads when I forced SQL Server to expand the view.

A second new index hint, NOEXPAND, does the opposite of EXPAND VIEWS. You use NOEXPAND in your query's FROM clause to force SQL Server to use the indexed view, but only if you've referenced the view in your FROM statement. Listing 3 shows an indexed view on Northwind's Orders table. A simple query selecting from this view doesn't use the index on this view, as you can see if you display the query's execution plan. However, using the hint WITH (NOEXPAND) forces the query optimizer to consider only the view—and indexes on that view—in the execution plan. The optimizer won't consider the base table or its indexes. If you use the WITH (NOEXPAND) hint with a view that isn't indexed, you'll get the following

The database must always expand nonindexed views into their underlying SELECT statement and access data only from the base table.

Note that if you're accessing the underlying table in your FROM clause, the OPTION (EXPAND VIEWS) hint will keep SQL Server from using an indexed view on that table, but SQL Server 2000 doesn't have a hint to let you do the opposite. You can't force the database to use an indexed view without referencing that view in your query. The optimizer decides whether the view's query tree matches the submitted query's query tree, then determines which is more cost-effective: using the indexed view or the base table. In addition, the SQL Server 2000 optimizer might decide that using the base table is cheap enough and won't even bother determining a cost for using the indexed view.

As I noted earlier, you can force the optimizer to use the index on the view by referencing the indexed view in your FROM clause and using the NOEXPAND hint. However, you can't force the optimizer to use the index on the view by using the INDEX hint, which tells SQL Server to use a particular index. If you use INDEX=1 to try to force the optimizer to use the clustered index, SQL Server will use the clustered index on the base table instead. Even if you specify the name of the clustered index on the view in the INDEX hint, SQL Server will still use the clustered index on the base table. Unless you use the NOEXPAND hint, SQL Server will replace the view with its underlying definition by the time the optimizer starts processing the query, and the optimizer won't see the view.

For testing purposes, you can force SQL Server not to use your indexed views by simply changing a required SET option to the wrong value. For example, setting ANSI_NULLS OFF will disallow indexed views.

Relaxing the Restrictions

The laundry list of syntax restrictions for the CREATE VIEW statement applies only to creating the indexed view. When selecting from the view, you can use any valid SQL terms; the optimizer will then decide whether to include the indexed view in the query execution plan.

Not being able to include the average (AVG) aggregate in your view definition might seem like a major inconvenience. But keep in mind that you can still use SUM, and you must use the COUNT_BIG aggregate if you have any other aggregates, so you can compute the average yourself.

For example, the Product_Totals indexed view contains a total_qty and a number column for each productid. You just divide the former by the latter to get the average:

SELECT productid, average_qty = total_qty/number
FROM Product_Totals

The query execution plan and the STATISTICS IO values show that the optimizer used the indexed view to process this query.

To verify that this is the same result you would get if you used the AVG aggregate, run the following query:

Not only does this second query yield the same results as the first, but the optimizer chose to use the indexed view to process the second query also so that the database didn't have to compute the sum and the count during execution. Not all applications and queries, however, will benefit from indexed views. For tips about which applications and queries will benefit most from this new feature, see the sidebar "The Best Views for Indexes," at left.

Managing and Tuning Indexed Views

In many ways, you manage indexed views like you manage any other view. The sp_help procedure tells you which columns are part of the view, and sp_helptext shows you the view's definition, unless you create the view WITH ENCRYPTION.

DROP VIEW drops an indexed view as easily as it drops a non-indexed view. And if you drop the clustered index on a view, SQL Server automatically drops all nonclustered indexes. You can use ALTER VIEW to change an indexed view's definition, but be aware that if you alter an indexed view, all indexes on that view disappear. You have to rebuild the clustered index to make the view an indexed view again. As a precaution, make sure you save all your index definitions in script files.

SQL Server 2000 extends the Index Tuning Wizard (ITW), introduced in SQL Server 7.0, to recommend not only indexes on tables but to optionally build views and indexes on views. By default, this option is turned on in beta 1, but you can turn this option off if you don't want the ITW to consider indexed views.

If you do let the ITW define indexed views for you, be careful. Because the ITW provides correct values for all required SET options when it creates the indexed views, the views will always succeed during hypothetical analysis. However, if your SET options have different values, your applications won't be able to use the indexed views, and inserts, deletes, and updates on tables included in the indexed view definitions will fail.

Power Up Indexed Views

To get the most out of indexed views,

analyze your workload so that you're familiar with the types of queries your users run.

enable the proper SET options before creating any tables used in the view definition.

use the OBJECTPROPERTY function's 'IsIndexable' option to make sure you can index the view.

make sure the first index on the view is unique and clustered (other indexes don't have to be unique or clustered).

And remember that you don't need to change your applications or queries to take advantage of any indexed views you build. The optimizer determines whether an indexed view is available and useful, and your applications need never change.