LINQ: The Good, The Bad, and The Poorly Optimized (Part 1)

Part I: LINQ and sp_executesql

Numerous questions raised in numerous forums mention that LINQ to SQL wraps queries in a call to the stored procedure sp_executesql. There seems to be agreement that this is done to ensure the caching of the query and query parameterization (if, indeed, parameters are supplied). Some advantages to sp_executesql, like resistance to SQL injection attacks, are unrelated to performance and will not be discussed here. While it is true that part of the purpose of invoking sp_executesql is the caching or query plans, there also seems to be some fair consensus that caching is always good thing, since it avoids repetitive optimization of a query. This of course is not true. A cached plan can be good for some parameter values and very poor for others, making the use of sp_executesql problematic at best. Let’s examine this potential problem.

Caching Query Execution Plans

Some queries will never suffer from parameterization and caching. This includes searching for a single row based on a primary key. In such cases, the physical method for the search will always be the same, regardless of which individual key is sought. The problem queries are those where the number of rows retrieved can vary greatly depending on the key. This includes looking up rows in a foreign key table when doing a JOIN, and, of course, these queries are exceedingly common. For our examination, we’ll look at selecting customers based on region, the same query we used previously in our discussion of index selection. The database is Learning Tree’s Bigwind example database.

First we will need some linq code. The example used here is F# code, but the query optimization principles involving sp_executesql apply to LINQ to SQL, not to any particular language used to implement it.

execsp_executesqlN’SELECT [t0].[CustomerID] AS [Item1], [t0].[CompanyName] AS [Item2], [t0].[Region] AS [Item3]

FROM [dbo].[Customers] AS [t0]

WHERE [t0].[Region] = @p0′,N’@p0 nvarchar(4000)’,@p0=N’Delaware’

execsp_executesqlN’SELECT [t0].[CustomerID] AS [Item1], [t0].[CompanyName] AS [Item2], [t0].[Region] AS [Item3]

FROM [dbo].[Customers] AS [t0]

WHERE [t0].[Region] = @p0′,N’@p0 nvarchar(4000)’,@p0=N’California’

If examine the plan cache, we see there is a single prepared plan with an execution count of two.

The prepared plan has an execution count of two, since the same plan was used for both California and Delaware.

As a curious side note, we can run these same SQL commands from above using the Management Studio, and we will observer a second prepared plan appear. Why don’t the queries from the Management Studio use the same plan? The answer lies in a set option. Taking a peek at the actual query plan XML, we see that the plan generated by LINQ has ARITHABORT set to false, while the same SQL executed from the Management Studio has ARITHABORT set to true.

LINQ has set ARITHABORT false.By default, a connection made from the Management Studio has ARITHABORT set to true.

Using different values for standard session parameters such as ARITHABORT and CONCAT_NULL_YIELDS_NULL forces SQL Server to maintain different sets of plans, because it does not know how the different values might affect query results. Establishing consistency in the use of set options will help ensure that the same query uses the same plan, regardless of the client that submitted the query. This good practice is even more important for stored procedures.

Changing Session Parameters in LINQ

In the case of LINQ, we can modify our F# code to directly set the value of ARITHABORT.

Examining the Actual Plan

Since the query was first run with “Delaware” as the value for Region, it is the Delaware plan that appears in the cache. When California is used as the parameter value, it is still the Delaware plan that is applied. We can confirm the relative quality of the query plans by looking at IO in both ad hoc and prepared plans.

The first and third queries are the ad hoc and the prepared queries for Delaware. They both use the same optimized plan so it is not surprising they have the same IO, 130 pages. In contrast the second and fourth queries are the ad hoc and prepared queries for California. The optimal query for California, at least under the conditions for this test, requires 936 page reads. When LINQ parameterizes the query, it inadvertently forces SQL Server to use the suboptimal Delaware plan, which requires 4179 page reads when retrieving the more numerous California rows.

Conclusion

It is a simple fact that the same base query can require completely different physical query methods depending on parameter values. If this were not true, the job of the query optimizer would certainly be a lot simpler! An important consequence of this is that forcing a parameterized plan into the cache can be good for some parameter values and bad for others. In this example, there is a “double-whammy”, since the regions with large numbers of customers execute with the inferior plan, and it is those queries that are more likely to be run.

LINQ provides no means that I am aware of to prevent the use of sp_executesql and the caching of plans. However, in the next session we will see how to manually take charge of some of the detailed structure of the query plan.