Leonard Lobel

Leonard Lobel (Microsoft MVP, Data Platform) is the chief technology officer and co-founder of Sleek Technologies, Inc., a New York-based development shop with an early adopter philosophy toward new technologies. He is also a principal consultant at Tallan, Inc., a Microsoft National Systems Integrator and Gold Competency Partner.

Programming since 1979, Lenni specializes in Microsoft-based solutions, with experience that spans a variety of business domains, including publishing, financial, wholesale/retail, health care, and e-commerce. Lenni has served as chief architect and lead developer for various organizations, ranging from small shops to high-profile clients. He is also a consultant, trainer, and frequent speaker at local usergroup meetings, VSLive, SQL PASS, and other industry conferences.

Lenni has also authored several MS Press books and Pluralsight courses on SQL Server programming

I want to…

Rethinking the Dynamic SQL vs. Stored Procedure Debate with LINQ

August 1, 2009 — Leonard Lobel

Dynamic SQL is evil, right? We should only be using stored procedures to access our tables, right? That will protect us from SQL injection attacks and boost performance because stored procedures are parameterized and compiled, right?

Well think again! The landscape of this debate has changed dramatically, especially with the advent of Language-Integrated Query (LINQ) technologies against relational databases, such as LINQ to SQL (L2S) against SQL Server and LINQ to Entities (L2E) against the Entity Framework’s (EF) Entity Data Model (EDM) over any RDBMS with an ADO.NET provider for EF. This is because, despite the fact that both these ORM technologies support the use of stored procedures, their real intended use is to generate dynamic SQL.

For simplicity, the examples below only use LINQ to SQL, but the same principles apply to LINQ to Entities.

Comparing Dynamic SQL and Stored Procedures in LINQ Queries

Let’s compare the behavior of LINQ queries that generate dynamic SQL with LINQ queries that invoke stored procedures. In our example, we’ll use the Sales.Currency table in the AdventureWorks 2008 database that contains a complete list of global currency and exchange rates. The following L2S query generates dynamic SQL to select just those currencies with codes beginning with the letter B:

The ctx variable is the L2S DataContext, and its Currencies property is a collection mapped to the Sales.Currency table in the database (it’s common practice to singularize table names and pluralize collection names).

Modifying this query to use a stored procedure is easy. Assuming we create a stored procedure named SelectCurrencies that executes SELECT * FROM Sales.Currency and then import that stored procedure into either our L2S (.dbml) or EF (.edmx) model, the LINQ query requires only a slight modification to have it call the stored procedure instead of generating dynamic SQL against the underlying table:

// LINQ to SQL using a stored procedure
var q =
from currency in ctx.SelectCurrencies()
where currency.CurrencyCode.StartsWith("B")
select currency;

The only change made to the query is the substitution of the SelectCurrencies method (which is a function that maps to the stored procedure we imported into the data model) for the Currencies property (which is a collection that maps directly to the underlying Sales.Currency table).

Examining the Generated SQL

There is a major performance problem with this new version of the query, however, which may not be immediately apparent. To understand, take a look at the generated SQL for both queries:

Now the problem should be blatantly obvious. The first query executes the filter condition for currencies with codes that start with the letter B in the WHERE clause of the SELECT statement on the database server (by setting parameter @p0 to ‘B%’ and testing with LIKE in the WHERE clause). Only results of interest are returned to the client across the network. The second query executes the SelectCurrencies stored procedure which returns the entire table to the client across the network. Only then does it get filtered by the where clause of the LINQ query to reduce that resultset and obtain only currencies with codes that start with B, while all the other (“non-B”) rows that just needlessly traversed the network from SQL Server are immediately discarded. That clearly amounts to wasted processing, and is a serious performance penalty for the use of stored procedures with LINQ.

Of course, one obvious solution to this problem is to modify the SelectCurrencies stored procedure to accept a @CurrencyCodeFilter parameter and change its SELECT statement to test against that parameter as follows: SELECT * FROM Sales.Currency WHERE CurrencyCode LIKE @CurrencyCodeFilter. That will ensure that only the rows of interest are returned from the server, just like the dynamic SQL version behaves. The LINQ query would then look like this:

Performance problem solved, but this solution definitely begs the question “where’s the WHERE?” – in the stored procedure, or in the LINQ query? It needs to be in the stored procedure to prevent unneeded rows from being returned across the network, but then it’s not in the LINQ query any more. So LINQ queries that you need optimized for stored procedures won’t have a where clause in them, and in my humble opinion, that seriously undermines the effectiveness and expressiveness of LINQ queries because the query is now far less “language-integrated”.

Revisiting the Big Debate

Clearly LINQ to SQL and Entity Framework want us to embrace hitting the database server with dynamic SQL, but many database professionals live by the creed “dynamic SQL is the devil, and thou shalt only use stored procedures.” So let’s re-visit the heated “dynamic SQL vs. stored procedure” debate.

Proponents of stored procedures cite the following primary reasons against using dynamic SQL:

1) Security: Vulnerability to SQL injection attacks results from building T-SQL statements using string concatenation. Even stored procedures are vulnerable in this respect, if they generate dynamic SQL by concatenating strings. The primary line of defense against SQL injection attacks it to parameterize the query, which is easily done with dynamic SQL. That is, instead of concatenating strings to build the T-SQL, compose a single string that has one or more parameters, and then populate the Parameters collection of the SqlCommand object with the parameter values (this is how the L2S query above prepared the command for SQL Server, as evidenced by the output of the generated T-SQL that uses the parameter @p0 in the WHERE clause).

2) Performance: You’re a little behind the times on this one. It’s true that stored procedures would get partially compiled to speed multiple executions in SQL Server versions 6.5 (released in 1996) and earlier. But as of SQL Server 7.0 (released in 1999), that is no longer the case. Instead, SQL Server 7.0 (and later) compiles and caches the query execution plan to speed multiple executions of the same query (where only parameter values vary), and that’s true whether executing a stored procedure or a SQL statement built with dynamic SQL.

3) Maintainability: This remains a concern if you are embedding T-SQL directly in your .NET code. But with LINQ, that’s not the case because you’re only composing a LINQ query (designed to be part of your .NET code); the translation to T-SQL occurs on the fly at runtime when you execute your application.

Making a Good Compromise

These facts should change your perspective somewhat. But if you’re a die-hard stored procedure proponent that finds it hard to change your ways (like me), consider this compromise: Allow dynamic SQL for SELECT only, but continue using stored procedures for INSERT, UPDATE, and DELETE (which can be imported into your data model just like a SELECT stored procedure can). This is a good strategy because LINQ queries only generate SELECT statements to retrieve data. They can’t update data. Only the SubmitChanges method on the L2S DataContext (or the SaveChanges method on the L2E ObjectContext) generates commands for updating data, with no downside to using stored procedures over dynamic SQL like there is for SELECT.

So you can (and should) stick to using stored procedures for INSERT, UPDATE, and DELETE operations, while denying direct access to the underlying tables (except for SELECT). Doing so allows you to continue using stored procedures to perform additional validation that cannot be bypassed by circumventing the application layer and communicating directly with the database server.

Related

Great post, I totally agree with your compromise. Another big issue that we’ve experiencd with L2S and EF, is when you’re performing a SELECT procedure with a custom result set, niether object mapper can create the implied record set and types with out having placeholder tables or views. Which is a maintanence nightmare to keep synchronized.

This task is easily managed using anonymous types with LINQ. A major feature to lose when selecting data from your database using dynamic SQL.

Argument #3 for maintainability does not address the fact that if you use LINQ queries in your app code, you will have to recompile/redeploy your app if you need to change a query. That is another reason we use stored procedures.

I’d like to point out a few things regarding two of your sections – Security and Performance.

Security: SQL injection attacks do happen even with stored procedures, if not written correctly. Using the system stored procedure sp_executesql eliminates this from occurring. Also, it is best to limit the windows account that the application runs as to EXEC only. This way, SQL Server prohibits the application layer from building poorly written dynamic SQL statements and sending possible SQL injection attacks to the database.

Performance: Queries that access large tables or are particularly complex, including multiple join operators, the database could select a bad execution plan. Putting the query in the hands of Entity Framework will make optimization extremely difficult, to near impossible because the selection of tuning is not available. Even if it were, the application would need to be recompiled after the change and restarted. This is downtime for the application and creates work deploying the changes. However, if a query is created as a stored procedure (within SQL Server), a specific execution plan can be recreated outside of Entity Framework and the .NET application from having to be recompiled and deployed.

Entity Framework should be reserved for only the easiest data access requirements on approved circumstances, from a database administrators perspective.