Implied Predicates and Query Hints

In this post, I want to take a look at how two seemingly unrelated features of SQL Server can interact to cause a problem. The idea for this post came from a question submitted by a reader. Let’s begin. Consider the following trivial schema and query:

First, the plan includes the predicate “T2.A = 0”. Although we did not specify this predicate in the original query, SQL Server derives it from the predicates that we did specify. This derivation is a good thing. It allows SQL Server to filter out rows from the scan of T2 earlier than would otherwise be possible.

Second, the original equijoin predicate “T1.A = T2.A” appears nowhere in this plan. This predicate is redundant with the original predicate “T1.A = 0” and the derived predicate “T2.A = 0” so SQL Server eliminates it. Normally, eliminating a redundant predicate is a good thing. By the time the rows from the scans reach the join, there is no reason to evaluate this predicate. It would be a waste of time. Unfortunately, in this case, the eliminated predicate also happens to be the only equijoin predicate and hash joins (and merge joins) require at least one equijoin predicate. Thus, the query with the HASH JOIN hint fails.

Note that the loss of the hash join and merge join alternatives for the above plan is not a big deal from a performance perspective. With or without the join predicate, the query is a cross join since all rows from T1 will join with all rows from T2.

If we throw in a third, briefly lived feature of SQL Server 2008, the situation gets even more complex. SQL Server 2008 RTM has an optimization that substitutes constants for parameter and variable values if we use the RECOMPILE hint. Note that this optimization was removed from SQL Server 2008 Cummulative Update 4 and Service Pack 1 to fix an issue. If your instance of SQL Server has the fix, you will not be able to reproduce the following scenario.

Let’s see a simple example of this optimization in action. Compare the plans for the following two identical queries:

DECLARE @P INTSET @P = 0

SELECT * FROM T1 WHERE T1.A = @P

|–Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=[@P]))

SELECT * FROM T1 WHERE T1.A = @POPTION (RECOMPILE)

|–Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

Observe how the WHERE clause in the first plan references the @P variable while the WHERE clause in the second plan which includes the RECOMPILE hint references the constant 0. This optimization is safe since the plan generated for the query with the RECOMPILE hint will be used only once and then discarded rather than cached.

The preceding plans are “actual” rather than “estimated” query plans. That is, I collected them using SET STATISTICS PROFILE ON rather than SET SHOWPLAN_TEXT ON. The parameter substitution optimization can only be applied when the query is recompiled immediately prior to execution and the actual parameter values are known.

SQL Server has transformed the predicate “T1.A BETWEEN @P1 AND @P2” first into “T1.A BETWEEN 0 AND 0” and then into “T1.A = 0”. Now we have exactly the same scenario that we had in the first failed query above.

Query hints can be powerful tools, but they can also backfire – often in mysterious ways. In these examples, two and even three separate features all interact to induce the error. Even a small change to a query or to a stored procedure’s parameters can be the difference between success and failure.

Hash and merge join both require equijoin predicates. (See the table in this post blogs.msdn.com/…/702828.aspx.) Since the above query does note have an equijoin predicate, you must remove the hash join hint and allow the optimizer to choose a nested loops join. Unfortunately, if the two tables are large, this query may take some time as it will effectively compare all rows in one table to all rows in the other table.

@ W Moore. Perhaps I chose my words poorly. In the last sentence of this post, I was referring to how hints can prevent the optimizer from finding a plan if the hints request a plan that is not valid for the given query. Small differences in a query (e.g., changing "T1.a = T2.b" to "T1.a > T2.b") can change the set of valid plans and cause a hint that worked into a hint that causes an error. In the absence of hints, SQL Server will find and execute a plan for any valid query. This is why hints should be used cautiously.

The first query succeeds because as noted in the above post, with the recompile hint, the optimizer substitutes the empty string constant for the variable @c. You can see the substitution by checking the compute scalar in the plan:

Without the recomile hint, the optimizer cannot do the substitution. I believe it then incorrectly concludes that @c is not a runtime constant and further concludes that it cannot safely use the hash join since it would need to reevaluate the orders subquery for each row from the customers table. I'm afraid that I'm not aware of any workaround other than to remove hash join hint.