This isn’t a controversial topic. It’s widely accepted that simplicity is a virtue. That’s the message, I hope to give some methods and motivation.

Write Simple SQL For Yourself

Revisit old code from time to time and write code that you won’t mind revisiting.

All SQL Developers are translators. We translate English descriptions of what we want into SQL. Simpler queries mean less mental energy is required when “translating” these requirements into SQL. It also goes the other way. If I were write some SQL and then revisit it a year later, then I have to translate backwards in order to understand the query’s “intent”.

What? you never revisit old code? Whether it’s code or blog articles, when I look back at what I wrote two short years ago, I’m often not happy with what I read. I sometimes cringe a bit at what I’ve written and will often update old blog posts that need it. SQL is no exception. But reviewing old writing is a useful exercise. Joan Didion, a favorite author of mine, points out “We are well advised to keep on nodding terms with the people we used to be.”

Write Simple SQL For Others

Simpler SQL doesn’t need to be shorter SQL. Jeremiah Peschka pointed me at a blog post by Selena Deckelmann How I Write Queries Using PLSQL: CTEs
Selena describes (better than I can) how useful CTEs can be when developing SQL. The CTEs provide a way to construct “a set of understandable ‘paragraphs’ of SQL” which can be used to “explain the logic of the query with others”.

Now that is how you do self-documenting code with SQL. When written with CTEs, your SQL will be much clearer than when you use subqueries. But don’t take my word for it. Check out a brilliant example in Brad Schulz’s “Playing Poker With SQL“. In the blog post he develops a single query which reports the results of a 10 hand poker game. Here’s an abbreviated version of his final query:

Read his tips. They’re very well explained. After his explanation, he mentions a strategy to deal with complex queries. He writes “My usual approach is to break the query into manageable parts, storing reasonably-sized intermediate results in #temporary tables.” I want to show an example demonstrating how something like that could work.

I use a query I made up against Adventureworks2012 which, for blogging purposes, we call complicated:

Most people’s intuition is that a single query is preferable. But just like Paul White, I have found that performance can sometimes be improved when the work is split into many queries. Here’s an example of what that might look like:

I like to use this performance technique before I consider query hints, (but after other simpler improvements like indexing). Even so, this technique is not always appropriate. I’ve seen it work best on complicated queries (How do you know when they’re complicated?). And I’ve seen this work best against large datasets (processing millions of rows for example). Complicated queries have a higher risk of generating poor query plans. But breaking these huge queries into smaller parts addresses this problem.

In my example, I’ve split the original query into seven. That was just for illustration. Maybe better is splitting your monster query into only three queries. Always test.

Empirical evidence tells me that simpler SQL performs better. I’ve split up complicated queries and often they become much easier to maintain but almost as often I’ve never needed to!

Really?

I’ve added a few comments below that qualify some of the things I’ve said here.

I had a couple conversations with work colleagues who questioned me about this article. The first conversation is about a pitfall of CTEs that I didn’t mention.

Can CTEs sometimes mask complexity?
It involves the problem of using a subquery twice. When a CTE is written once, but referenced twice, it’s not always clear that it’s being evaluated twice. The same query written using derived tables will be written with two identical subqueries. This makes it obvious that the subquery is being evaluated twice.

I think I still recommend using CTEs. I acknowledge there’s a small chance that CTE usage masks complexity – complexity which would be obvious otherwise. But I’ll take that risk if it means easier-to-understand SQL. I’m on board with Selena Deckelman’s approach of “designing with CTEs and optimizing for performance only if needed.”

The other conversation I had was surrounding the usage of temp tables.

Why are you advocating #temp tables, I thought you avoid those?
I advocate breaking up large complex queries into smaller chunks using #temporary tables. But what about our recent tempdb woes as described in recent blog posts:

I seem to be saying two things: In those articles I’m advocating reduced usage of things that hit tempdb. But in this article, I’m unashamedly pushing #temp tables again.

The difference is in how these queries are being used. Pick your favorite query to tune and try to answer these questions:

Are you running this query hundreds of times per second? (as opposed to daily)

Are you hoping to tune this query from 5 seconds down to 5 milliseconds? (as opposed to 5 minutes to 5 seconds)

Does this query get triggered with frequent end-user activity? (As opposed to infrequent business analyst activity?)

Does your query plan fit on one screen?

If you answered mostly “yes”, then it seems you’re tuning a query in an OLTP system. And I would not recommend using temporary tables in this case. You probably won’t be able to reach acceptable query durations using this technique any way. For OLTP systems, avoid tempdb.

But for infrequent reporting use cases, this technique may help you. Remember, most tempdb problems are commonly caused by frequent tempdb queries.

” When a CTE is written once, but referenced twice, it’s not always clear that it’s being evaluated twice. The same query written using derived tables will be written with two identical subqueries. This makes it obvious that the subquery is being evaluated twice.”
That’s an interesting perspective, and not one I really subscribe to. If one was to write the same subquery twice then that violates the principle of DRY (http://en.wikipedia.org/wiki/Don%27t_repeat_yourself), and I consider DRY to be *the* most important guiding principle when I write any code (not just T-SQ code).
I love me some good encapsulation 🙂

You bet Jamie, But I think my colleague was thinking that when one writes a subquery twice, it acts as a visual prompt so that the developer considers rewriting the whole thing to avoid the duplication. The argument is that a developer might not see or recognize the duplicated work when a CTE is used.

I like the DRY in principle in theory, but in practice I’ve seen it betray us quite badly. In order to encourage code re-use, I’ve seen views (and views within views) that sweep complexity under the rug. One such query had a dozen or so nested views, but no single sql statement ever indicated what a monster this query actually was (Of course the query plan doesn’t lie).

I happen to have Martin Fowler’s Refactoring on my desk. He has Duplicated Code as his #1 code smell, but it is still just one of 22.

I’m just saying that I can understand and sympathize with both points of view, but I’ll re-iterate that “I think I still recommend using CTEs”

Use of temp tables is not really a bad idea in OLTP systems. We use lots of temp tables within stored procedures to breakup the large queries into several small queries. The main advantage is that if you use a large query with five, six JOINs to live tables, it places the locks on those live tables until the statement completes. Since the query is large it may take more time to execute thus not leasing the locks. This creates blockings in busy OLTP systems. The recommendation would be to use temp tables and then use those temp tables in the large queries so that blockings will be minimized.
It is clear that use of temp tables will increase the use of tempdb and it might creates a bottleneck. To avoid that you need to optimize your tempdb. Follow this link for more details (http://technet.microsoft.com/en-us/library/ms175527(v=sql.105).aspx)

I understand where you’re coming from.
The tempdb bottleneck I fear most is PAGELATCH contention. But the link you provided mostly deals with tempdb capacity planning and placement. Microsoft does have a tempdb concurrency article here (http://support.microsoft.com/kb/328551). I’ve found a lot of their advice helps, but as a last resort they suggest “making changes to the workload/code”. And that’s why I avoid using temp tables frequently.

You talk about the trouble you have with regular concurrency (locking and excessive blocking) that is alleviated with the use of temp tables. In my experience, I have often been able to tackle such queries on a case by case basis, mostly by tuning (via rewrites, indexing, application-caching etc…).

Basically when presented with such scenarios. I like to highlight the contrast between a “query that is large” and a “busy OLTP system”. Those don’t usually go hand in hand.

Hi Michael,
Thanks for the feedback.
I have experienced the LATCH contention on tempdb due to frequently creating and dropping temp tables in stored procedures which has very high executions (in my case more than 3000 calls per sec). I agree with you in this type of cases you may avoid temp tables or consider application re-write to reduce the no. of sp calls. There are some other solutions suggested for this type of contentions using TF.(same link you mentioned)
Finally, some times you may have stored procedures which has complex business logic which often contains large queries with many joins. These can be found in busy OLTP systems and it is not uncommon.