Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

See first query below.
Can NOT combine table hint Index and forceseek with two joins and the joins are not on the PK.
How to make the first query compile?

Interesting
- if just one join or the other then can combine index and forceseek hints
- if the index is the PK then can combine 2 joins and have both hints

-- compiler fails
-- Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.
Select Count(Distinct([docSVsys].[sID]))
From [docSVsys] with (nolock)
Left Join [docSVtext] with (nolock, Index(IX_docSVtext_value_sID), forceseek )
On [docSVtext].[sID] = [docSVsys].[sID]
Left Join [docMVtext] with (nolock, Index(ix_docMVtext_value_sID), forceseek)
On [docMVtext].[sID] = [docSVsys].[sID]
where [docSVtext].[value] = 'doug' or
[docMVtext].[value] = 'doug'
-- can do one join
Select Count(Distinct([docSVsys].[sID]))
From [docSVsys] with (nolock)
Left Join [docSVtext] with (nolock, Index(IX_docSVtext_value_sID), forceseek )
On [docSVtext].[sID] = [docSVsys].[sID]
where [docSVtext].[value] = 'doug'
-- can do the other join
Select Count(Distinct([docSVsys].[sID]))
From [docSVsys] with (nolock)
Left Join [docSVtext] with (nolock, Index(IX_docSVtext_value_sID), forceseek )
On [docSVtext].[sID] = [docSVsys].[sID]
where [docSVtext].[value] = 'doug'
-- if on the PK then can do forceseek on two join
Select Count(Distinct([docSVsys].[sID]))
From [docSVsys] with (nolock, INDEX(PK_docSVsys))
Left Join [docSVtext] with (nolock, Index(PK_docSVtext), forceseek )
On [docSVtext].[sID] = [docSVsys].[sID]
Left Join [docMVtext] with (nolock, Index(PK_docMVtext), forceseek)
On [docMVtext].[sID] = [docSVsys].[sID]
where [docSVtext].[value] = 'doug'
or [docMVtext].[value] = 'doug'

This does what I want and runs 100 times faster.
This has almost an identical query plan to the answer from ypercube.
But I cannot combine this with some other conditions I need.

Select count(distinct([docSVsys].[sID]))
From [docSVsys] with (nolock)
LEFT HASH JOIN [docSVtext] with (nolock)
on [docSVtext].[sID] = [docSVsys].[sID]
and [docSVtext].[value] = 'doug'
LEFT HASH JOIN [docMVtext] with (nolock)
on [docMVtext].[sID] = [docSVsys].[sID]
and [docMVtext].[value] = 'doug'
where [docSVtext].[sID] is not null
or [docMVtext].[sID] is not null

This also uses the proper indexes and is the fastest.
For now I don't have the option to rewrite the format.
The program builds up queries to other tables.
We incorrectly wanted the same format for where as an order by and that got us where we are.

SELECT count(distinct(c.sID))
from
( SELECT sv.[sID]
FROM [docSVtext] AS sv
WHERE sv.[value] = 'doug'
UNION ALL
SELECT mv.[sID]
FROM [docMVtext] AS mv
WHERE mv.[value] = 'doug'
) as c

This question came from our site for professional and enthusiast programmers.

What is interesting is if change the OR to an AND in the where then can specify both hints. On OR verses AND also results in a different join (OR HASH and the AND NESTED LOOP).
–
BlamOct 15 '12 at 16:17

Yes that is fast and uses the desired indexes. For now the easy fix is the two table hints as that is how the queries are being built up. This is a program that build up queries to many tables. In the next version we are probably going to use the format in your answer or union / intersect.
–
BlamOct 11 '12 at 14:04

4

I don't think it's a good idea to use hints. Unless you have performance problems and you are sure that the execution plan is not the best and you have no other way of getting around it (rewriting, different indexes, etc).
–
ypercubeOct 11 '12 at 14:07

But I can't answer on how the double hint can be enforced (and not limit the optimizer's options so a plan is produced). Perhaps you can flag your question for migration to the (sister) DBA.SE site. There are several users there that could answer this far better.
–
ypercubeOct 11 '12 at 14:09

This a case of can't get it to use the correct index without reformatting the query. We are going to reformat the queries without hints but for now I am hoping for a patch to the current format. Thanks again.
–
BlamOct 11 '12 at 14:24

The bad news is that this cannot be done today due to limitations in the SQL Server query optimizer.

SELECT
COUNT_BIG(DISTINCT(dsv.[sid]))
FROM dbo.docSVsys AS dsv
LEFT JOIN dbo.docSVtext AS dst ON
dst.[sid] = dsv.[sid]
LEFT JOIN dbo.docMVtext AS dmt ON
dmt.[sid] = dsv.[sid]
WHERE
dst.value = 'doug'
OR dmt.value = 'doug';

The logical query specification above is semantically the same as the EXISTS formulation proposed in ypercube's answer and the LEFT JOIN modification made in wBob's answer. The logical formulation says nothing about how SQL Server ought to physically execute the query. The results are correct (as Richard agrees) due to the DISTINCT, but that does not mean the query is 'wrong'. Given a query optimizer with every possible transformation and infinite time and resources, the text above could produce exactly the same optimal execution plan as the texts suggested by ypercube and wBob would.

By way of explanation, let's look at the way SQL Server finds an efficient physical plan for your first 'can do' example:

SELECT
COUNT_BIG(DISTINCT(dsv.[sid]))
FROM dbo.docSVsys AS dsv
LEFT JOIN dbo.docSVtext AS dst ON
dst.[sid] = dsv.[sid]
WHERE
dst.value = 'doug';

If the query optimizer performed only the most obvious implementation, the query plan would be:

As it is, the optimizer performs two important rewrites. First it sees that the WHERE clause predicate rejects any NULLs generated by the LEFT JOIN. A simplification rule called SimplifyLOJN turns the Filter + Left Outer Join into an Inner Join:

Second, the optimizer rewrites the (group by sid) Stream Aggregate + Inner Join as a Left Semi Join using a rule called GbAggJNtoLSJN:

The same sequence of events applies to your second example (which meant to reference the docMVtext table I am sure). As before, the LEFT JOIN is simplified to an INNER JOIN because of the NULL-rejecting WHERE predicate, then the resulting DISTINCT aggregate and INNER JOIN combination is simplified to a Left Semi Join.

If the optimizer contained logic to reason about the OR predicate in your original example more fully (considering also the DISTINCT), it could produce a query plan like:

A T-SQL formulation for that plan is close to the UNION rewrite you might have in mind:

SELECT COUNT_BIG(*) FROM
(
SELECT dsv.[sid]
FROM dbo.docSVsys AS dsv
LEFT JOIN dbo.docSVtext AS dst ON
dst.[sid] = dsv.[sid]
WHERE
dst.value = 'doug'
UNION
SELECT dsv.[sid]
FROM dbo.docSVsys AS dsv
LEFT JOIN dbo.docMVtext AS dmt ON
dmt.[sid] = dsv.[sid]
WHERE
dmt.value = 'doug'
) AS SubQuery;

If the optimizer truly contained all possible rewrites, even that formulation could be transformed to a more efficient form:

This is just the query plan for ypercube's suggested rewrite (though the exact shape of the 'ideal' plan will depend on data volumes and distribution, but you see my general point, I hope).

>The bad news is that this cannot be done today due to limitations in the SQL Server query optimizer. I was able to get the query to compile by swapping the order of the index columns round, as per my simple repro. I'm not saying this is a great solution because it isn't. Reads were sky-high in Profiler, however it did compile without changing the text of the query.
–
wBobOct 19 '12 at 9:31

@wBob Thanks yes I did read that. I hope my answer is clear in referring to the original question exactly as stated.
–
Paul WhiteOct 19 '12 at 9:35

Your query has significant problems. The only reason the answer is still correct is the use of DISTINCT. Otherwise you'd see a gigantic result set being generated if the tables were large due to the cartesian product between the 3 tables.

Your OR is basically saying that if the left join on docSVtext on SID+'doug' worked, give me all the records from docMVtext based on SID alone. Although the DISTINCT will render this a no-op, SQL Server cannot optimize this path out because on another SID value, it could be docMVtext that successfully matches on 'doug', so SQL Server needs to descend down both join paths.

When SQL Server says your hints don't make sense, in this case it's because they don't. It cannot use the 'doug' value against either of the LEFT JOIN-ed tables as a lead into the index, so it has to basically dig up each and every index key on the first column (value) to get to the 2nd column (sid) to perform the seek. However, because of the or condition, there isn't a legal master between the 2 LEFT JOIN-ed tables, so neither can be used to seek against the other.

I was still curious as to why that ugly query behaved that way. How is it that the PK resolves to a legal master - it is still an OR?
–
BlamOct 16 '12 at 17:48

Your OR is unrelated to either LEFT JOIN, therefore it cannot be used until after the joins are resolved. So consider the query without the WHERE. Now, you're asking it to SEEK on the (value)-leading index for the SID. This requires it to perform a bookmark-lookup back to the data page (along PK). Doing it twice is just too much work.
–
孔夫子Oct 16 '12 at 19:37

4

Richard I've edited your excellent and helpful post and the comments to make them a bit more 'community-friendly'. I'd be glad to discuss this with you in the site chat room if you have any objection at all to the changes I made. Thanks for contributing here and please keep up the good work :)
–
Jack Douglas♦Oct 17 '12 at 16:51

This was originally my question this is correct. But as stated in the question it also defaults to the proper index and seek with no hints in this format.
–
BlamOct 15 '12 at 16:15

This changes the semantics of the query, because each LEFT JOIN now has 2 conditions to work with (value,SID). The original query cannot perform the filter until after the JOINS have resolved.
–
孔夫子Oct 17 '12 at 4:10

+1 This formulation is semantically the same as the original query. It may be physically more efficient given the optimizer's abilities today, but as Richard says in his answer, the semantics (taking the query as a whole) are the same.
–
Paul WhiteOct 19 '12 at 7:45

I reversed and did not fix for me. As stated in the question everything fine with index on PK and your IX looks like my PK. Since it is no longer my question I cannot update the question. Both tables sID int, fieldID int, value string. SV has PK sID, fieldID. MV has PK sID, fieldID, value. Both ix (reverse lookup) value, sID. If there is something specific you would like me to test please post a comment.
–
BlamOct 16 '12 at 0:17

2

@Blam: You can edit and it is still your question.
–
ypercubeOct 16 '12 at 5:17