It was performing horribly, so I added an index on col1, col2, col3 (int, bit, and datetime). When I checked the query plan it was ignoring my index. I tried reordering the columns in the index in every possible configuration and it always ignored the index. When I run the query it does a clustered index scan (table size is between 700K and 800K rows) and takes 10-12 seconds. When I force it to use my index it returns instantly. I was careful to clear the cache and buffers between tests.

Am I missing anything here? I hate to put an index hint in a stored procedure, but SQL Server just can’t seem to get a clue on this one. Anyone know any other things that might prevent SQL Server from recognizing that using the index is a good idea?

EDIT: One of the columns being returned is a TEXT column, so using a covering index or an INCLUDE won't work :(

8 Answers
8

You have 800k rows indexed by col1, col2, col3. Col2 is a bit, so its selectivity is 50%. Col3 is a checked on a range (<=), so it's selectivity will be roughly at about 50% too. Which leaves col1. The query is compiled for the generic, parametrized plan, so it has to account for the general case. If you have 10 distinct values of col1, then your index will return approximately 800k /10 * 25% that is about ~20k keys to lookup in the clustered index to retrieve the '...' part. If you have 10k distinct col1 values then the index will return just 20 keys to look up. As you can see, what matters is not how you build your index in this case, but the actual data. Based on the selectivity of col1, the optimizer will choose a plan based on a clustered index scan (as better than 20k key lookups, each lookup at a cost of at least 3-5 page reads) or one based on the non-clustered index (if col1 is selective enough). In real life the distribution of col1 also plays a role, but going into that would complicate the explanation too much.

You can come with the benefit of hindsight and claim the plan is wrong, but the plan is the best cost estimate based on the data available at compile time. You can influence it with hints (index hint as you suggests, or optimize for hints as Quassnoi suggests) but then your query may perform better for your test set, and far worse for a different set of data, say for the case when @col1 = <the value that matches 500k records>. You can also make the index covering, thus eliminating the '...' in the projection list that require the clustered index lookup necessary, in which case the non-clustered index is always a better cost match than the clustered scan.

Kimberley Tripp has a blog article covering this subject, she calls it the 'index tipping point' which explains how come an apparently perfect candidate index is being ignored: a non-clustered index that does not cover the projection list and has poor selectivity will be seen as more costly than a clustered scan.

create index ix1 on My_Table(Col3, Col1) INCLUDE(Col2)
-- include other columns from the select list if needed

Also, you've left out the rest of the columns from the select list. You might want to consider including those if there aren't many either in the index or as INCLUDE statement to create a covering index for the query.

I bet SQL Server thinks the price of getting the rest of the columns (designated by ... in your example) from the clustered index outweighs the benefit of the index so it just scans the clustered key. If so, see if you can make this a covering index.

It's using the clustered key if I don't force the index use. The column list includes all of the columns in the table. While I could put a big covering index on that, I'd be effectively duplicating the table. I'll need to look into the INSERT/UPDATE/DELETE frequencies to see if the cost is warranted.
–
Tom H.Jul 10 '09 at 13:55

If your SELECT is returning columns that aren't in your index SQL my find that its more efficient to scan the clustered index instead of having to do a key lookup to find the other values that you are requesting.

If you have a TEXT column try switching the data type to VARCHAR(MAX) then including the values in the nonclustered index.