Kalen Delaney : index hintshttp://sqlblog.com/blogs/kalen_delaney/archive/tags/index+hints/default.aspxTags: index hintsenCommunityServer 2.1 SP2 (Build: 61129.1)Geek City: Using a Nonclustered Index to Avoid a Sorthttp://sqlblog.com/blogs/kalen_delaney/archive/2008/12/02/using-a-nonclustered-index-to-avoid-a-sort.aspxWed, 03 Dec 2008 03:20:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:10245Kalen Delaney15http://sqlblog.com/blogs/kalen_delaney/comments/10245.aspxhttp://sqlblog.com/blogs/kalen_delaney/commentrss.aspx?PostID=10245<P>Most of you are probably aware that having a clustered index on the column(s) in an ORDER BY clause means that SQL Server can avoid having to sort your data, because it is already logically stored in order of the clustered index, and SQL Server can just access the data in order to get the sorted data .</P>
<P>For example, consider the <EM>SalesOrderHeader</EM> table in the <EM>AdventureWorks</EM> database. The clustered index is on <EM>SalesOrderID</EM>, so this query doesn't need to do a sort, just a clustered index scan:</P>
<P><FONT face="Courier New" size=1>SELECT * FROM Sales.SalesOrderHeader<BR>ORDER BY SalesOrderID</FONT></P>
<P><A href="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_2.png"><IMG style="BORDER-RIGHT:0px;BORDER-TOP:0px;BORDER-LEFT:0px;BORDER-BOTTOM:0px;" height=70 alt=image src="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_thumb.png" width=272 border=0></A> </P>
<P>But what about a nonclustered index? Its leaf level stores the index key values in order, so it can help avoid a sort if it completely covers the query, i.e. all the data your query needs is in the nonclustered index. The following query is covered by the nonclustered index on <EM>CustomerID</EM>, because the nonclustered index always includes the clustered key, in this case, <EM>SalesOrderID</EM>.</P>
<P><FONT face="Courier New" size=1>SELECT CustomerID, SalesOrderID <BR>FROM Sales.SalesOrderHeader<BR>ORDER BY CustomerID</FONT>
<P><A href="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_4.png"><IMG style="BORDER-RIGHT:0px;BORDER-TOP:0px;BORDER-LEFT:0px;BORDER-BOTTOM:0px;" height=57 alt=image src="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_thumb_1.png" width=244 border=0></A>
<P>But what about if the query is not covered? What if we wanted every column returned:</P><FONT face="Courier New" size=1>SELECT * FROM Sales.SalesOrderHeader<BR>ORDER BY CustomerID</FONT>
<P>The default plan for this query will be to perform a sort.</P>
<P><A href="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_6.png"><IMG style="BORDER-RIGHT:0px;BORDER-TOP:0px;BORDER-LEFT:0px;BORDER-BOTTOM:0px;" height=54 alt=image src="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_thumb_2.png" width=244 border=0></A> </P>
<P>However, the leaf level of the nonclustered index on <EM>CustomerID</EM> does have all the <EM>CustomerID</EM> values already sorted, so why can't that index be used? The answer is, it CAN be used, but it just isn't the default. SQL Server's optimizer tries to find the plan that will run to completion in the least amount of time. That sounds good, right? But with the SORT operator in the plan, everything stops while the sorting is taking place, and no data can be returned until all the data is sorted. </P>
<P>However,&nbsp; another alternative would be to scan the nonclustered index, where the <EM>CustomerID</EM> values are already in order. For each row, SQL Server would have to do a key lookup into the clustered index and the total time to do a key lookup for every row would probably be more than the time required to sort all the data. The first few rows can be returned very quickly. How can we get such a plan? SQL Server provides us with a hint called FASTFIRSTROW that tells the optimizer to come up with a plan that returns the first row in a minimum amount of time. It's a table hint, so it looks like this:</P><FONT face="Courier New" size=1>SELECT * FROM Sales.SalesOrderHeader WITH (FASTFIRSTROW)<BR>ORDER BY CustomerID</FONT>
<P>The plan looks like this:</P>
<P><A href="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_8.png"><IMG style="BORDER-RIGHT:0px;BORDER-TOP:0px;BORDER-LEFT:0px;BORDER-BOTTOM:0px;" height=66 alt=image src="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_thumb_3.png" width=244 border=0></A> </P>
<P>So you have to decide what is most important to you.&nbsp; Do you want the total processing time to be minimized, or do you want the time to have the first row returned to be minimized? It's up to you. The default is to minimize total processing time, but you can&nbsp; use the FASTFIRSTROW hint if you want to take advantage of the nonclustered index to avoid the sort, and have the first few rows returned quickly.</P>
<P>Be careful if you try to do a cost comparison of queries with and without the FASTFIRSTROW hint. Look at the two plans below:</P>
<P><A href="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_10.png"><IMG style="BORDER-RIGHT:0px;BORDER-TOP:0px;BORDER-LEFT:0px;BORDER-BOTTOM:0px;" height=106 alt=image src="http://sqlblog.com/blogs/kalen_delaney/WindowsLiveWriter/GeekCityUsingaNonclusteredIndextoAvoidaS_E2B4/image_thumb_4.png" width=244 border=0></A> </P>
<P>Comparing the two plans makes it look like the query with the hint is infinitely faster the query without the hint. However, if you look at the details for the nonclustered index scan, as shown in the pop-up properties box, you can see that the way the optimizer comes up with this plan is by assuming only one row will be returned. It optimizes as if only one row will be accessed, which is why the nonclustered index is chosen, but during execution it will retrieve all the rows. So of course a plan for accessing one row will be considered MUCH faster than a plan to access the entire table (31465 rows).</P>
<P>The documentation for this hint can be a little misleading. You might think using the hint indicates that the plan should get the first row quickly, and then get all the rest of rows using perhaps a different access method, so they will ALL come back as quickly as possible (which would then mean a sort), it actually means that the optimizer should just come up with a plan for getting the first row as quickly as possible. Period. (Then whatever plan was chosen for the first row will also be used for all the rest of the rows.)</P>
<P>This FASTFIRSTROW hint is listed in the SQL Server Books Online as a deprecated feature, which means it may be removed in a future version. However, it is still available in SQL Server 2008.&nbsp; Microsoft recommends using the FAST N query hint (in the OPTION clause): </P>
<P><FONT face="Courier New" size=1>SELECT * FROM Sales.SalesOrderHeader <BR>ORDER BY CustomerID<BR>OPTION (FAST 1)</FONT></P>
<P>With this hint, you can specify an value for N, and the optimizer just assumes there are N rows, and comes up with the best plan as if there were that number. As an exercise, you might want to try determining at what value for N the optimizer will switch from using an nonclustered index scan to using a clustered scan plus SORT.</P>
<P>Today, a reader asked me a question about the FASTFIRSTROW hint and wanted to know if we should "use such a hint for large data sets [e.g. OLTP queries]".</P>
<P>Of course, the answer is the usual: It depends.</P>
<P>On the one hand, using this hint to avoid a sort can be a good thing, because sorting a large data set can use a lot of system resources (time and <EM>tempdb</EM>). But on the other hand, if you're really running OLTP queries, there should only be a few rows you're dealing with in any query, and then the value of this hint might not be as noticeable. So you should run your own tests and see if you like the results. Note that FAST N or FASTFIRSTROW is not the default, and that is probably for a good reason. Try running your queries first with whatever plan the optimizer comes us with, and only if you're not satisfied with the performance, you can try using a hint.</P>
<P>Have fun!</P>
<P><FONT color=#ff00ff size=4>~Kalen</FONT></P><img src="http://sqlblog.com/aggbug.aspx?PostID=10245" width="1" height="1">index hintsnonclustered indexsortGeek City: Clustered or Nonclustered? Why not both?http://sqlblog.com/blogs/kalen_delaney/archive/2008/04/24/clustered-or-nonclustered-why-not-both.aspxThu, 24 Apr 2008 21:51:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:6399Kalen Delaney2http://sqlblog.com/blogs/kalen_delaney/comments/6399.aspxhttp://sqlblog.com/blogs/kalen_delaney/commentrss.aspx?PostID=6399<P>I had been thinking of this post all day, and then noticed that <A href="http://sqlblog.com/blogs/denis_gobo/archive/2008/04/24/6385.aspx" target=_blank>Denis wrote a post with almost the same name</A>. I was worried he might have written about something similar, but it turns out not to be the case. </P>
<P>A group of colleagues have been having a discussion about this topic recently, which was spurred by the fact that the&nbsp; Microsoft supplied <EM>Northwind</EM> database has duplicates of many of its indexes. If Microsoft does this, many new database users might assume it's a good idea. It's not! Having two identical indexes gives you no additional benefit, but does give you lots of additional overhead when those indexes need to be maintained. In SQL Server 2008, you will be able to create a policy to detect and prohibit this behavior if you choose. But that of course, is still in the future.</P>
<P>But what if the indexes are not quite identical? What if one is a clustered index and one is nonclustered? My colleagues report seeing this behavior frequently when someone declares a Primary Key on a column, and then doesn't realize that automatically builds an index (clustered by default), so she then builds a nonclustered on the same key.&nbsp; Is this necessarily all bad? The clustered index is useful when most of the columns of many of the rows need to be returned, based on the value in the PK column, or when the data needs to be returned sorted by the PK. But what if you only need a few rows, and you need only key columns? For example, what about a count(*) query?</P>
<P>In general, when satisfying a count(*) query, the SQL Server optimizer will choose the index with the smallest number of pages at the leaf level. A nonclustered index typically will have far fewer leaf level rows that a clustered, but still contains an entry for every single row, so the count(*) value will be accurate. So consider this example:</P>
<P><FONT face="Lucida Console" color=#0000a0>USE AdventureWorks;<BR>-- create a big table by copying another one <BR>IF EXISTS (SELECT * FROM sys.tables WHERE name = 'newdetails')<BR>&nbsp;&nbsp;&nbsp;&nbsp; DROP TABLE newdetails;<BR>SELECT * INTO newdetails FROM Sales.SalesOrderDetail; </FONT>
<P><FONT face="Lucida Console" color=#0000a0>-- Build a PK and a NC index on the same column<BR>ALTER TABLE newdetails <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ADD CONSTRAINT PK_Detail PRIMARY KEY (SalesOrderDetailID);<BR>CREATE UNIQUE INDEX UNQ_Detail ON newdetails (SalesOrderDetailID); </FONT>
<P><FONT face="Lucida Console" color=#0000a0>-- Look at the plan; the optimizer will choose the NC index<BR>SET SHOWPLAN_TEXT ON;<BR>SELECT count(*) FROM newdetails <BR>WHERE SalesOrderDetailID BETWEEN 1000 and 2000; <BR>SET SHOWPLAN_TEXT OFF;</FONT></P>
<P><FONT face="Lucida Console" color=#0000a0>-- Look at the performance; the nonclustered is performing better<BR>-- Even though the NC index will be chosen without the hint, <BR>--&nbsp;&nbsp; i included it here to make it more obvious which index is chosen<BR>SET STATISTICS IO ON;<BR>SELECT count(*) FROM newdetails WITH (INDEX = UNQ_Detail)<BR>WHERE SalesOrderDetailID BETWEEN 1000 and 2000<BR>SELECT count(*) FROM newdetails WITH (INDEX = PK_Detail)<BR>WHERE SalesOrderDetailID BETWEEN 1000 and 2000<BR>SET STATISTICS IO OFF;</FONT></P>
<P>I am absolutely not recommending that you always duplicate your PK index with another one, but rather illustrating that it is not always completely pointless or harmful to do so. Choosing the best indexes takes a lot of careful consideration of your data distribution and your query usage patterns, and there is no one-size-fits-all answer.</P>
<P>I hope this is useful to you,</P>
<P><FONT color=#ff00ff size=4>~Kalen</FONT></P><img src="http://sqlblog.com/aggbug.aspx?PostID=6399" width="1" height="1">index hintsindexesDid You Know? Everything's Up to Date in Kansas City!http://sqlblog.com/blogs/kalen_delaney/archive/2008/02/01/kansas-city-seminar.aspxFri, 01 Feb 2008 18:54:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:4826Kalen Delaney1http://sqlblog.com/blogs/kalen_delaney/comments/4826.aspxhttp://sqlblog.com/blogs/kalen_delaney/commentrss.aspx?PostID=4826<P>They are so up-to-date, I have chosen it as one of the handful of&nbsp; cities on my user group tour... </P>
<P>I'll be presenting a full day seminar on <EM>Controlling and Reusing Query Plans</EM> in SQL Server 2005. You can get more information from&nbsp; my schedule page at <A href="http://schedule.kalendelaney.com/" target=_blank>schedule.KalenDelaney.com</A> </P>
<P>Full details, including registration information, can be found here:<BR><A title=http://www.cleardata.biz/events/query-plans-2008.aspx href="http://www.cleardata.biz/events/query-plans-2008.aspx">http://www.cleardata.biz/events/query-plans-2008.aspx</A></P>
<P>If you live anywhere nearby, it would be great to see you there.</P>
<P><FONT color=#ff00ff>~Kalen</FONT></P><img src="http://sqlblog.com/aggbug.aspx?PostID=4826" width="1" height="1">trainingindex hintsexecution planplan guidesuser groupsseminarsDid You Know? Forcing a nonclustered index scan to avoid sortinghttp://sqlblog.com/blogs/kalen_delaney/archive/2007/07/22/did-you-know-forcing-a-nonclustered-index-scan-to-avoid-sorting.aspxSun, 22 Jul 2007 17:13:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:1876Kalen Delaney4http://sqlblog.com/blogs/kalen_delaney/comments/1876.aspxhttp://sqlblog.com/blogs/kalen_delaney/commentrss.aspx?PostID=1876<P>&nbsp;</P>
<P>Hopefully, if you know enough about&nbsp;SQL Server to understand most of my posts here, you're aware of the fact that using a&nbsp;nonclustered index is not always an optimal choice for accessing data, particularly if there are thousands&nbsp;(or more) rows of data you need to retrieve.&nbsp;&nbsp;However, we should never say 'never'. </P>
<P>In general, if you have a query that is sorting on a column that has a nonclustered index, and needs to retrieve columns that are not part of the index, the optimizer will choose to scan the table (or clustered index, which is the same thing) and sort the data. The optimizer usually estimates that the time needed to scan the table plus the time to sort is going to be less than the time to follow each bookmark individually from the nonclustered index. And in some cases it might be right. </P>
<P>In other cases, it might not be, and the only way to verify that is to force SQL Server to use the nonclustered index, with a Lookup into the table or clustered index for each row.&nbsp; However, if you try to force this behavior with an INDEX hint, you might be surprised. Using a hint that tells the optimizer what index to use does not tell the optimizer HOW to use that index.&nbsp; It might perform a scan with lookup for every single row, or it might scan the nonclustered index, sort the index rows in the order of the clustered index&nbsp;and then access the clustered index using the sorted data. I've seen other plans as well and in this post, I'm not going to what the different plans actually mean. </P>
<P>What I want to tell you now is a way to force SQL Server to perform a scan of the nonclustered index and then a bookmark lookup for every row. You can do that with a table hint called FASTFIRSTROW. This hint tells the optimizer to choose a plan that returns the first row as quickly as possible. The default plan has to sort the entire table before any rows are returned at all. However, the nonclustered index already has the nonclustered key values sorted, and if that is the column used in the ORDER BY, following the leaf level of the nonclustered index should give us the rows already in correct sorted order. Yes, in some cases it might take longer to get ALL the rows in the table, following the bookmark for each row, but the first row will always be available fast. </P>
<P>It turns out that there are some cases where the FASTFIRSTROW hint can actually return all the rows more quickly than the default SORT. If you have lots of rows in your table (hundreds of thousands, or millions) you might find the FASTFIRSTROW hint ends up being faster. The only way to know for sure is to test it with your data. </P>
<P>Itzik Ben-Gan did some testing with different ways of getting sorted data, and if you are a subscriber to SQL Server Magazine, you can read about his tests here:</P>
<P><A title=http://www.sqlmag.com/Articles/Index.cfm?ArticleID=94775&amp;DisplayTab=Article href="http://www.sqlmag.com/Articles/Index.cfm?ArticleID=94775&amp;DisplayTab=Article">http://www.sqlmag.com/Articles/Index.cfm?ArticleID=94775&amp;DisplayTab=Article</A></P>
<P>If you want to run some tests of your own, here is some code to get you started.</P>
<P>-- First, make a copy of the Sales.SalesOrderDetail table in the AdventureWorks database</P>
<P>USE AdventureWorks<BR>GO<BR>IF OBJECT_ID('dbo.details', 'U') IS NOT NULL<BR>&nbsp;&nbsp;&nbsp; DROP TABLE dbo.details <BR>GO<BR>CREATE TABLE details (<BR>[SalesOrderID] [int] NOT NULL,<BR>[SalesOrderDetailID] [int] NOT NULL,<BR>[CarrierTrackingNumber] [nvarchar](25) NULL,<BR>[OrderQty] [smallint] NOT NULL,<BR>[ProductID] [int] NOT NULL,<BR>[SpecialOfferID] [int] NOT NULL,<BR>[UnitPrice] [money] NOT NULL,<BR>[UnitPriceDiscount] [money] NOT NULL CONSTRAINT [DF_SalesOrderDetail_UnitPriceDiscount] DEFAULT ((0.0)),<BR>[LineTotal] AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),<BR>[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL CONSTRAINT [DF_SalesOrderDetail_rowguid] DEFAULT (newid()),<BR>[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_SalesOrderDetail_ModifiedDate] DEFAULT (getdate()),<BR>)<BR>GO
<P>-- Next, insert data into the new table
<P>INSERT INTO details <BR>(SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,<BR>UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate) <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FROM Sales.SalesOrderDetail<BR>GO
<P>-- Now create a clustered and nonclustered index on the new details table
<P>CREATE CLUSTERED INDEX<BR>idx_cl_col1 ON dbo.details(SalesOrderID);<BR>CREATE NONCLUSTERED INDEX<BR>idx_nc_col2 ON dbo.details(ProductID);<BR>GO</P>
<P>-- First look at the plan with no hints</P>
<P>SELECT * FROM dbo.details&nbsp;&nbsp;<BR>&nbsp; &nbsp;ORDER BY ProductID </P>
<P>-- Now look at the plans with the two different index hints.</P>
<P>SELECT * FROM dbo.details <BR>WITH (FASTFIRSTROW) <BR>ORDER BY ProductID
<P>SELECT * FROM dbo.details <BR>WITH&nbsp; (INDEX (idx_nc_col2)) <BR>ORDER BY ProductID
<P>-- As you add more rows to your table, the plan for the query using the INDEX hint might change. <BR>-- You can add more rows by just rerunning the INSERT statement from above, as many times as you would like.
<P>Have fun!</P>
<P><FONT color=#ff00ff>~Kalen</FONT></P><img src="http://sqlblog.com/aggbug.aspx?PostID=1876" width="1" height="1">index hintsnonclustered indexsort