Query Optimizer

Recently I’ve been requested the code of my book Inside the Server Query Optimizer so I am including it in this post. The book contains a large number of example SQL queries, all of which are based on the AdventureWorks database and Chapter 6 additionally uses the AdventureWorksDW database. All code has been tested on both SQL Server 2008 and SQL Server 2008 R2. Note that these sample databases are not included by default in your SQL Server installation, but can be downloaded from the CodePlex website.

In a previous post I talked about the new columnstore indexes and their related processing algorithms which are available in SQL Server code-named Denali. In this post I will cover the query processing part of the technology in more detail and will show you some examples that you can test on the recently released CTP3 (Community Technology Preview) of the product.

Same as with previous versions of SQL Server, in Denali the query optimizer can choose between the available access methods, which now also include columnstore indexes, and as always, this will be a cost-based decision. A new choice the query optimizer will have to make is the selection of an execution mode. The new query processing algorithms mentioned in my previous post will run in what is called a batch execution mode, which is different from the traditional processing mode, now called row mode.

In the row execution mode operators process data one row at a time. The new batch execution mode process data in batches which is more efficient for large amounts of data, like the workloads present on data warehouse queries. Each operator in an execution plan can use the row execution mode and, when columnstore indexes are available, some operators can also use the batch mode. There is both an estimated and an actual execution mode and this information is displayed on the query execution plan as I will show later. It is also worth mentioning that, although columnstore indexes can speed up the performance of data warehouse queries, they are not a good choice for very selective queries returning only a few records. In this case the query optimizer may have to rely on row stores, like clustered or regular nonclustered indexes, to find those records quickly. There are no seeks on columnstore indexes.

Same as with previous versions of SQL Server, you still have the choice to use a hint to force any index in the cases where the query optimizer is not giving you a good execution plan. This can happen for example when the query optimizer is choosing a columnstore index when it shouldn’t or when you want to force a columnstore index when it is not being selected. You can also use the new IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX hint to ask the query optimizer to avoid using any columnstore index.

Let me show you an example which you can test on SQL Server Denali CTP3, currently available for download here. To follow this example you will also need the AdventureWorksDWDenali database, available at CodePlex and I will use the same example on BOL to skip the basics and go directly to analyze the batch processing mode (By the way the BOL example didn’t work directly with the AdventureWorksDWDenali database so I had to add a few more columns at the end of the CREATE TABLE statement.)

First, use the following BOL code to create a partition function, a partition scheme and a new partitioned table with a columnstore index

SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
FROM FactResellerSalesPtnd
GROUP BY SalesTerritoryKey;

This will create the following plan where you can see the new Columnstore Index Scan operator

The properties of the Columnstore Index Scan operator are shown next

You may notice that the actual and estimated execution mode is Row (lines 3 and 4 on the list of properties). Row execution mode was selected because the table is not large enough to require the batch execution mode. We can use the undocumented ROWCOUNT and PAGECOUNT options of the UPDATE STATISTICS statement to simulate a larger table as shown next (for more information about how this works see my post about the DTA here)

I am honored to be speaking at the PASS Summit again this year. I’ve been attending this SQL Server conference every year since 2003 and this will be my fourth year speaking. Same as last year I will again be presenting two sessions.

On my first session, Inside the SQL Server Query Optimizer, I will go into the internals of the Query Optimizer and will show you the steps that it performs in the background covering everything from the time a query is submitted to SQL Server until an execution plan is generated. On my second session, Parameter Sniffing: the Query Optimizer vs. the Plan Cache, I will show you how the Query Optimizer uses parameter sniffing to produce a plan tailored to the current parameters of a query and why in some cases it could be a performance problem, including troubleshooting and solutions to these cases.

The PASS Summit is less than four months away and you can register here. I look forward to meeting lots of SQL Server professionals, including those whom I only know via twitter. See you in Seattle in October.

Another interesting topic that I usually talk about on my presentations is statistics on computed columns so I will use this post to show you how they work and how they can help you to improve the performance of your queries.

A problem faced by some queries using scalar expressions is that they usually cannot benefit from statistics and, without them, the Query Optimizer will use the 30% selectivity guess on inequality comparisons. A solution to this problem can be the use of computed columns, as SQL Server can automatically create and update statistics on these columns which can help the Query Optimizer to create better execution plans. An additional benefit of this solution is that you don’t need to specify the name of the computed column in your queries for SQL Server to use its statistics. The Query Optimizer automatically matches the computed column definition to an existing scalar expression in a query, so your applications do not need to be changed. Although computed columns have been available in previous versions of SQL Server, the automatic matching feature was only introduced with SQL Server 2005.

The estimated number of rows is 36,395.1, which is 30% of the total number of rows, 121,317, although the query returns only 5 records. SQL Server is obviously using a selectivity guess, as it cannot estimate the selectivity of the expression OrderQty * UnitPrice > 25000.

Now create a computed column:

ALTER TABLE Sales.SalesOrderDetail
ADD cc AS OrderQty * UnitPrice

Run the previous SELECT statement again and note that, this time, the estimated number of rows has changed to 84.3101 which is very close to the actual number of rows returned by the query, as shown in the following plan:

You can optionally test replacing the 25,000 in the query with some other values, like 1,000, 10,000, or 20,000 and verify that the estimated again will be close to the actual number of rows returned.

Note that creating the computed column does not create statistics; these statistics are created the first time that the query is optimized, and you can run the next query to display the information about the statistics objects for the Sales.SalesOrderDetail table:

The newly created statistics object will most likely be at the end of the list. Copy its name and use the following command to display the details about the statistics object (I’ve used the name of my local object, but you should replace that as appropriate). You can also use "cc" as the name of the object to get the same results. In both cases, the "cc" column should be shown on the Columns field in the density section.

Unfortunately, for the automatic matching feature to work, the expression must be exactly the same as the computed column definition. So, if I change the query to UnitPrice * OrderQty, instead of OrderQty * UnitPrice, the execution plan will show an estimated number of rows of 30% again, as this query will demonstrate:

As mentioned, the computed column provides statistics so the Query Optimizer can try to get you a better execution plan. In addition, you can create an index on the existing computed column to provide a better navigational alternative. Create the following index

CREATE INDEX IX_cc on Sales.SalesOrderDetail(cc)

By running the original SELECT statement again the Query Optimizer will now choose the newly created index and will produce a more efficient plan using an Index Seek/Key Lookup instead of a Clustered Index Scan, as shown next.

As covered in my book Inside the SQL Server Query Optimizer, contradiction detection is a query rewrite (or tree rewrite?) performed at the simplification phase of the optimization process in which query contradictions are detected and removed. Since these parts of the query are not executed at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. For example, the Query Optimizer may know that no records can satisfy a predicate even before touching any page of data. A contradiction may be related to a check constraint, or may be related to the way the query is written. I will show you examples of both cases next.

First, I need to find a table with a check constraint in AdventureWorks and, handily, the Employee table has the following check constraint definition:

([VacationHours]>=(-40) AND [VacationHours]<=(240))

This check constraint makes sure that the number of vacation hours is a number between –40 and 240, so when I request

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 80

… SQL Server uses a Clustered Index Scan operator, as shown next

However, if I request all of the employees with more than 300 vacation hours then, because of this check constraint, the Query Optimizer must immediately know that no records qualify for predicate. Run the following code

SELECT * FROM HumanResources.Employee
WHERE VacationHours > 300

As expected, the query will return no records, but this time it will show the following execution planNote that, this time, instead of a Clustered Index Scan, SQL Server is using a Constant Scan operator. Since there is no need to access the table at all, SQL Server saves resources like I/O, locks, memory and CPU, making the query to be executed faster. Now, let’s see what happens if I disable the check constraint

This time, running the last query once again uses a Clustered Index Scan operator, as the Query Optimizer can no longer use the check constraint to guide its decisions. Don’t forget to enable the constraint again by running the following statement:

In this case there is no check constraint involved; both predicates are valid and each will individually return records, but they contradict each other when they are run together. As a result, the query returns no records and the plan shows again a Constant Scan operator similar to the plan shown previously. This may just look like a badly written query, but remember that some predicates may already be included in, for example, view definitions, and the developer of the query may be unaware of those. For example, in our last query, a view may include the predicate ManagerID > 10 and a developer may call this view using the predicate ManagerID < 5. Since both predicates contradict each other a Constant Scan operator will be used again instead.

One of the most interesting and perhaps not well known features of the Database Engine Tuning Advisor (DTA) is that you can use it with a test server to tune the workload of a production server. As I mentioned on the first part of this post, the DTA relies on the Query Optimizer to make its tuning recommendations and you can use it to make these optimizer calls to a test server instance without impacting the performance of the production server.

Information Required by the Query Optimizer

To better understand how this works let us first review what kind of information the Query Optimizer needs to tune a workload. Basically the most important information it needs to perform an optimization is:

The DTA can gather the database metadata and statistics from the production server and use it to create a similar database, with no data, on a different server. This is called a shell database. The DTA can also obtain the available memory and number of processors on the production server, by using the extended stored procedure xp_msver, and use this information for the optimization process. It is important to remind that no data is needed for the optimization process. This process is summarized in the following figure taken from Books Online:

This process provides the following benefits:

1) There is no need to do an expensive optimization on the production server which can impact its resources usage. Production server is only used to gather initial metadata and the required statistics.

2) No need to copy the entire database to a test server either, which is especially important for big databases, saving disk space and time to copy the database

3) No problems where test servers are not as powerful as production server as the DTA tuning session will consider the available memory and number of processors of the production server.

Running a Tuning Session

Now I am going to show an example of how to run a tuning session. First of all, the use of a test server is not supported by the DTA graphical user interface so the use of the dta utility, the command prompt version of DTA, is required. Configuring a test server also requires an XML input file containing the dta input information. I am using the following input file for this example

The Server and Database elements of the XML file include the production SQL Server instance and database. The Workload element includes the definition of a script containing the workload to tune. TuningOptions includes the TestServer subelement which is used to include the name of the test SQL Server instance.

This example creates an entire copy of AdventureWorks (with no data) and performs the requested optimization. The shell database is automatically deleted after the tuning session is completed. Optionally you can keep the shell database, for example if you want to use it again on another tuning exercise, by using the RetainShellDB in the TuningOptions element like in the following XML fragment.

If the shell database already exists when you request a tuning session, the database creation process will be skipped. However, you will have to manually delete this database when it is no longer needed.

Once the tuning session is completed you can use the DTA graphical user interface as usual to see the recommendations. To do this open the DTA, open the session you used by double-clicking its session name (session1 in our example) and chose the Recommendations tab if it is not already selected.

Scripting Statistics

Although the DTA automatically gathers the metadata and statistics to build the shell database, I am going to show you how to script the required objects and statistics to tune a simple query. This can be helpful in cases where you don’t want to script the entire database. Scripting database objects is a fairly simple process well known by SQL Server professionals. Something that may be new for many though, is how to script the statistics. Created scripts make use of the undocumented STATS_STREAM, ROWCOUNT and PAGECOUNT options of the CREATE/UPDATE STATISTICS statement.

These UPDATE STATISTICS statements are used to update the statistics of existing indexes (obviously the related CREATE INDEX statements were scripted as well). If the table also has column statistics it will include CREATE STATISTICS statements instead.

Testing Scripted Statistics

Finally, I will show you an example of how to use the scripted statistics to obtain plans and cost estimates on an empty table. Running the following query on the regular AdventureWorks database creates the following plan with an estimated number of rows of 9 and a cost of 0.0296835.

SELECT * FROM Sales.SalesOrderDetail
WHERE ProductID = 898

Let us produce the same plan on an empty database. Following the procedure described before you can script the Sales.SalesOrderDetail table. You will end with multiple statements including the following (again shortened to fit on this post).

Create a new database and run at least the previous four statements using the scripts you got on the previous step (or you can use the attached script on this post containing the statements needed to reproduce the example). After implementing the script on an empty database and running the sample query, you will get again the plan with cost 0.0296835 and estimated number of rows of 9.

Since I am planning to publish a couple of DTA-related posts later this week I thought it would be a good idea to start with this article which explains how the DTA relies on the Query Optimizer to make its tuning recommendations. This article was previously published in my book Inside the SQL Server Query Optimizer.

Currently, all major commercial database vendors include a physical database design tool to help with the creation of indexes. However, when these tools were first developed, there were just two main architectural approaches considered for how these tools should recommend indexes. The first approach was to build a stand-alone tool with its own cost model and design rules. The second approach was to build a tool that could use the Query Optimizer cost model.

A problem with building a stand-alone tool is the requirement for duplicating the cost module. On top of that, having a tool with its own cost model, even if it’s better than the optimizer’s cost model, may not be a good idea because the optimizer still chooses its plan based on its own model.

The second approach, using the Query Optimizer to help on physical database design, was proposed in the database research community as far as back as 1988. Since it’s the optimizer which chooses the indexes for an execution plan, it makes sense to use the optimizer itself to help find which missing indexes would benefit existing queries. In this scenario, the physical design tool would use the optimizer to evaluate the cost of queries given a set of candidate indexes. An additional benefit of this approach is that, as the optimizer cost model evolves, any tool using its cost model can automatically benefit from it.

SQL Server was the first commercial database product to include a physical design tool, in the shape of the Index Tuning Wizard which shipped with SQL Server 7.0, and which was later replaced by the Database Engine Tuning Advisor (DTA) in SQL Server 2005. Both tools use the Query Optimizer cost model approach and were created as part of the AutoAdmin project at Microsoft, the goal of which was to reduce the total cost of ownership (TCO) of databases by making them self-tuning and self-managing. In addition to indexes, the DTA can help with the creation of indexed views and table partitioning.

However, creating real indexes in a DTA tuning session is not feasible; its overhead could impact operational queries and degrade the performance of your database. So how does the DTA estimate the cost of using an index that does not yet exist? Actually, even during a regular query optimization, the Query Optimizer does not use indexes to estimate the cost of a query. The decision of whether to use an index or not relies only on some metadata and the statistical information regarding the columns of the index. Index data itself is not needed during query optimization but will, of course, be required during query execution if the index is chosen for the execution plan.

So, to avoid creating indexes during a DTA session, SQL Server uses a special kind of indexes called hypothetical indexes, which were also used by the Index Tuning Wizard. As the name implies, hypothetical indexes are not real indexes; they only contain statistics and can be created with the undocumented WITH STATISTICS_ONLY option of the CREATE INDEX statement. You may not be able to see these indexes during a DTA session because they are dropped automatically when they are no longer needed. However, you could see the CREATE INDEX WITH STATISTICS_ONLY and DROP INDEX statements if you run a SQL Server Profiler session to see what the DTA is doing.

Let’s take a quick tour to some of these concepts. To get started, create a new table on the AdventureWorks database:

SELECT *
INTO dbo.SalesOrderDetail
FROM Sales.SalesOrderDetail

Copy the following query and save it to a file:

SELECT * FROM dbo.SalesOrderDetail
WHERE ProductID = 897

Open a new DTA session, and you can optionally run a SQL Server Profiler session if you want to inspect what the DTA is doing. On the Workload File option, select the file containing the SQL statement that you just created and specify AdventureWorks as both the database to tune and the database for workload analysis. Click the Start Analysis button and, when the DTA analysis finishes, run this query to inspect the contents of the msdb..DTA_reports_query table:

Notice that the query returns information like the query that was tuned, as well as the current and recommended cost. The current cost, 1.2434, is easy to obtain by directly requesting an estimated execution plan for the query as shown next.

Since the DTA analysis was completed, the created hypothetical indexes were already dropped. To now obtain the indexes recommended by the DTA, click on the Recommendations tab and look at the Index Recommendations section, where you can find the code to create any recommended index by then clicking on the Definition column. In our example, it will show the following code:

In the next statement, and for demonstration purposes only, I will go ahead and create the index recommended by the DTA but, instead of a regular index, I will create it as a hypothetical index by adding the WITH STATISTICS_ONLY clause:

After implementing the recommendation and running the query again, the clustered index is in fact now being chosen by the Query Optimizer. This time, the plan shows a Clustered Index Seek operator and an estimated cost of 0.0033652, which is very close to the recommended cost listed previously when querying the msdb..DTA_reports_query table.

Finally, drop the table you just created by running the following statement:

Recently I was asked to troubleshoot a performance problem with a query using linked servers. The problem was related to a well known issue where the query processor is not able to get the required optimizer statistics from the remote server due to permissions of the user used on the linked server. This behavior is documented on the Books Online entry Guidelines for Using Distributed Queries as shown next:

“To create the best query plans when you are using a table on a linked server, the query processor must have data distribution statistics from the linked server. Users that have limited permissions on any columns of the table might not have sufficient permissions to obtain all the useful statistics, and might receive a less efficient query plan and experience poor performance. If the linked server is an instance of SQL Server, to obtain all available statistics, the user must own the table or be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role on the linked server.”

Basically the problem is that if the user used by the linked server does not have the permissions described in the previous Books Online entry, SQL Server will not be able to execute the DBCC SHOW_STATISTICS statement on the remote server to obtain the required statistics. It is however unfortunate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. Let me show you the problem with an example.

Plan with read-only permissions

I’ve created a linked server between two SQL Server instances each one hosting a copy of AdventureWorks database. First, I grant read-only permissions to the user used by the linked server and run the following artificial query just to demonstrate this behavior.

SELECT l.*FROM AdventureWorks.Sales.SalesOrderHeader l

JOIN remote.AdventureWorks.Sales.SalesOrderHeader r

ON l.SalesOrderID = r.SalesOrderID

WHERE r.CustomerID = 666

Running the previous query gets me the following plan:

In this case the Query Optimizer could benefit of knowing the cardinality estimate of the query executed on the remote server, that is, to know how many orders were placed by customer 666, but this information is not available for this plan. With an estimated guess of 2,362.49 rows the Query Optimizer is deciding to use a Merge Join plus a Clustered Index Scan when in fact the query is returning only 8 records. You can run Profiler against the remote server to learn what kind of information the local query processor is requesting from it. I’ve noticed that it executes the following five system stored procedures which obtain information about the tables, columns, indexes, check constraints and statistics involved on the remote query.

The last stored procedure, sp_table_statistics2_rowset, tries to obtain the header and density information of the statistics object using the following statement but it fails because of the lack of permissions mentioned earlier.

Now grant db_owner permissions to the user used on the liked server and run the query again while forcing a new optimization (for example using DBCC FREEPROCCACHE to clear the plan cache on the local server). Profiler will show that the previous five system stored procedures were again executed but this time the sp_table_statistics2_rowset procedure is able to successfully get the requested header and density information. In addition, the following statement is also issued to obtain the histogram information from the statistics object

DBCCSHOW_STATISTICS(N’"AdventureWorks"."Sales"."SalesOrderHeader"’,

"IX_SalesOrderHeader_CustomerID")WITHHISTOGRAM_STEPS

This time the following execution plan is returned

By using the histogram from the remote server, the local query processor is able to get an estimated number of rows of 6.65385 and decides to use a Nested Loops Join and a Clustered Index Seek instead which is a better plan and more appropriate as the actual number of records returned is only 8.

If you manually run the previous DBCC SHOW_STATISTICS statement on the remote server you can see the histogram, an extract of which is shown next, showing the step corresponding for the value for CustomerID 666 which in this case shows an estimated of 6.653846 records on the AVG_RANGE_ROWS column, which is also shown on the previous execution plan.

Remote plan

It is also worth mentioning that the query executed on the remote servers is practically the same in both cases, except that ORDER BY is needed on the first example as the Merge Join on the local plan requires sorted data, but the plan is still the same as shown next (both the query and the plan can be captured using Profiler).

SELECT "Tbl1003"."SalesOrderID" "Col1011"

FROM "AdventureWorks"."Sales"."SalesOrderHeader" "Tbl1003"

WHERE "Tbl1003"."CustomerID"=(666)

ORDERBY "Col1011" ASC

Again notice that the estimated number of rows is 6.65385

Conclusion

So we’ve seen how using a user with limited permissions to run a query through a linked server can in some specific cases be a performance problem because of the lack of access to optimizer statistics. However, it seems inappropriate that having access to the data does not also give you access to its statistics as having to provide higher level permissions could be a security concern. In fact, there is a related connect entry by Erland Sommarskog discussing the problem and suggesting that permissions to access data should also give access to its statistics. So let us hope a better solution to the use of linked servers is provided in the near future.

I just found today that my book, “Inside the SQL Server Query Optimizer”, is finally available on Amazon. You can find the description of the book on the Amazon page or a more detailed description chapter by chapter on one of my previous posts here. So I would like to use this post to thank the people who helped me or in some way influenced the content of the book.

Writing this book was a lot of fun, but also a lot of work; actually a lot more work than I originally expected. Fortunately I got help from several people. First of all, I would like to thank Chris Massey. Chris helped me as the technical editor of the book, and guided me through most of the writing process, doing absolutely outstanding work. Very special thanks also go out to Grant Fritchey who helped us as the technical reviewer, and went very much beyond just reviewing, as his comments provided invaluable feedback to improve the quality of this book. Thanks also go to Tony Davis for offering me this opportunity in the first place, and helping to get the project started.

Outside the Red Gate team, my deepest gratitude goes to Cesar Galindo-Legaria, Manager of the SQL Server Query Optimization team at Microsoft, for answering my endless list of questions about the Query Optimizer, and educating me through some of the information he has published, including numerous research papers and an excellent chapter of a SQL Server book. I had the opportunity to meet Tony, Cesar, and Grant at the PASS Summit back in 2009, when I first started this journey, and I hope to finally meet Chris at the same conference in October 2011.

Although Chris, Cesar, Grant and Tony have directly helped me to shape the content of this book, there’s also a host of other people who have indirectly influenced the book through their work, which helped me to learn about and better understand the SQL Server query processor. With that in mind, I would like to thank the authors who have discussed the query processor in some of the available SQL Server books, Microsoft white papers and blogs, including Kalen Delaney, Ken Henderson, Lubor Kollar, Craig Freedman, Conor Cunningham and Eric Hanson.

Research papers have provided me with an unlimited source of information, and helped me to understand the Query Optimizer at a significantly deeper level than ever before. So, thanks to all that amazing work by the many people in the research community including, in addition to Cesar, Goetz Graefe, Surajit Chaudhuri, Yannis Ioannidis, Vivek Narasayya, Pat Selinger, Florian Waas, and many, many more.

Finally, on the personal side, I would like to thank my parents, Guadalupe and Humberto, and my family: my wife Rocio, and my three sons, Diego, Benjamin and David. Thanks all for your unconditional support and patience.