I learned a lot of these techniques over time from trial an error, attending presentations, reading blog posts, speaking to other dbas and developers, etc… but never knew of a good resource that summarized these techniques in one place.

This post will be a quick round-up of everything I’ve covered so far, as well as 8 additional techniques that I use occasionally but don’t necessarily require a full detailed post to explain them.

Why Rewrite Queries?

I often find myself working in environments where modifying indexes or changing server settings is out of the question when performance tuning. I usually run into these scenarios when dealing with:

Vendor databases

“Fragile” systems

Not enough disk space

Limited tooling/ad hoc analysis

Features limited by security software

While solving the root cause of a performance problem is always preferable, sometimes the only way I’m able to fix problems in these environments is by rewriting the queries.

I decided to write this summary post because it is a resource I would have loved to have when starting out. Sometimes it can be easy to get “writer’s block” when trying to think of ways to rewrite a SQL query, so hopefully this list of techniques can provide ideas and get your creative juices flowing.

So, without further ado, here is a list of 12 techniques in no particular order that you can use to rewrite your queries to change their performance.

12 Ways to Refactor a Query to Change Performance

Sometimes window functions rely a little too much on tempdb and blocking operators to accomplish what you ask of them. While using them is always my first choice because of their simple syntax, if they perform poorly you can usually rewrite them as an old-fashioned GROUP BY to achieve better performance.

When filtering rows of data on multiple values in tables with skewed distributions and non-covering indexes, writing your logic into multiple statements joined with UNION ALLs can sometimes generate more efficient execution plans than just using IN or ORs.

Sometimes the query optimizer struggles to generate an efficient execution plan for complex queries. Breaking a complex query into multiple steps that utilize temporary staging tables can provide SQL Server with more information about your data. They also cause you to write simpler queries which can cause the optimizer to generate more efficient execution plans as well as allow it to reuse result sets more easily.

Using the DISTINCT operator is not always the fastest way to return the unique values in a dataset. In particular, Paul White uses recursive CTEs to return distinct values on large datasets with relatively few unique values. This is a great example of solving a problem using a very creative solution.

8. Create UDFs

Sometimes a poorly configured server will parallelize queries too frequently and cause poorer performance than their serially equivalent plan. In those cases, putting the troublesome query logic into a scalar or multi-statement table-valued function might improve performance since they will force that part of the plan to run serially. Definitely not a best practice, but it is one way to force serial plans when you can’t change the cost threshold for parallelism.

9. Data Compression

Not only does data compression save space, but on certain workloads it can actually improve performance. Since compressed data can be stored in fewer pages, read disk speeds are improved, but maybe more importantly the compressed data allows more to be stored in SQL Server’s buffer pool, increasing the potential for SQL Server to reuse data already in memory.

12. Copy the data

If you can’t get better performance by rewriting a query, you can always copy the data you need to a new table in a location where you CAN create indexes and do whatever other helpful transformations you need to do ahead of time.

…And more

By no means is this list exhaustive. There are so many ways to rewrite queries, and not all of them will work all the time.

The key is to think about what the query optimizer knows about your data and why it’s choosing the plan it is. Once you understand what it’s doing, you can start getting creative with various query rewrites that address that issue.

SQL Server’s cost-based query optimizer does a pretty good job of figuring out what order to filter your data to get fast query executions. It considers things like index coverage, data distribution, and much more to decide how to retrieve your query’s data.

However, these good intentions can become problematic in certain situations where you know more about your data than SQL Server does. When this happens, the order SQL Server chooses to execute predicates is important not just for performance of your query but for the business logic as well.

This table stores data for an application that has many different types of Pages. Each Page stores different types of data, but instead of creating a separate table for each type, we store all the different data in the varchar DataValue column and maintain the original data type in the DataType column.

This structure reduces the complexity required for maintaining our database (compared to creating possibly hundreds of tables, one for each PageName) and makes querying easier (only need to query one table). However, this design could also lead to some unexpected query results.

Filtering Mixed Data Values

Let’s say we want to retrieve all data from one table with where the value is 1.2:

This query runs fine. The problem is since our original data type was a decimal with a value of 1.20, this string-based comparison doesn’t work. What we really want to have happen is a numeric comparison in our predicate:

While the implicit conversion occurring on the table’s DataValue column is not ideal, if the number of rows it needs to convert is small it’s not so bad (plus, this isn’t the point of today’s post, so try and look past it for a few more moments).

Here comes the fun: what if we want to check all our Pages that contain numeric data for values of 1.2? We could write this query in a couple of different ways:

For both queries, we receive the error “Error converting data type varchar to numeric”.

Why? In this case SQL Server decides to do the implicit conversions on the DataValue column first before filtering on our PageName columns.

Up until this last query, SQL Server was deciding that it would be more efficient to filter the rows down to the specific Page first and then do the implicit conversions on the DataValue column. However, now that we are selecting more than one table, SQL Server says determines it has to scan everything anyway, it might as well do all of the implicit conversions first and filter on table names later.

The problem of course is that all our DataValue values are not numeric. In this case the order of the predicates does matter, not for performance but to be able to correctly execute the business logic that we defined as part of our query.

Not Good Solutions

One way we can fix this is to tempt SQL Server to filter on PageName first by adding an index:

This works great. SQL Server decides that since this index covers all the fields in our query, and because the index key is PageName, it will filter the rows on PageName first and perform the implicit conversions on the remaining rows.

The problem with this is that it’s not guaranteed. Something may happen that will cause SQL Server not use this index in the future: our index doesn’t cover our query anymore, we add some additional filtering, the index is removed so it can be replaced by a different index that will no longer be selected for this particular query, etc…

It just isn’t a reliable option.

Plus it doesn’t work in all scenarios. Let’s say we parameterize the PageName and use the STRING_SPLIT() function to filter our Pages to only those passed in:

We are back to square one since in this case STRING_SPLIT() needs to parse the PageName data first and then join it in with the rest of the data, causing our original failure scenario (this is the estimated execution plan):

Other Options

So while indexing seems to fix the solution, it’s not guaranteed to work 100% of the time.

Obviously we could not store data in this format, but that would add complexity to the database and app.

We could try to add the PageName filter into a derived table and force the join order, but that’s ugly and will force us to read the table multiple times.

Since we also have data type information available for each row, we might consider utilizing that information:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2 AND DataType = 'decimal'

But once again if this works it’s through sheer luck.

TRY_CONVERT() is another option. This function returns nulls if it can’t convert to a decimal:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND TRY_CONVERT(decimal(2,1),DataValue) = 1.2

This is actually a pretty good option since it’s guaranteed to work regardless of which column SQL Server filters on first. If the number of DataValues you have to TRY and CONVERT is relatively small though, this may be your best choice.

For better performance, you can create a second column that contains data in decimal (or any other type) format:

You could index both DataValue* columns and your performance would be pretty good. The downside here of course is that your app queries will have to change to match the new table structure:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND CASE WHEN DataType = 'decimal' THEN DataValueDecimal ELSE DataValue END = 1.2

In conclusion, it’s tough to say what the best option is for this type of scenario. However, it’s important to keep in mind that if you decide to structure and write your queries in this format, you need to plan for order of operation issues and handle errors gracefully.

This post is a response to this month’s T-SQL Tuesday #114 prompt by Matthew McGiffen. T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month. This month Matthew asks us to write about puzzles, so I decided to recreate a childhood favorite in SQL Server.

As a kid, I found Magic 8 Balls alluring. There is something appealing about a who-knows-how-many-sides die emerging from the depths of a mysterious inky blue fluid to help answers life’s most difficult questions.

I never ended up buying a magic eight ball of my own though, so today I’m going to build and animate one in SQL Server Management Studio.

Fun and Valuable? Signs point to yes.

While building a magic eight ball in SQL Server is not the most useful project in the world it is:

Fun

A great way to learn lots of cool SSMS and SQL tips and tricks to use in more useful situations.

Here’s an example of the finished project followed by all of the components that make this project work. The full code for this solution can be found at the bottom of this post.

ORDER BY NEWID()

After we create our data set of static messages, we need to randomly return 1 message for every shake of the eight ball. My favorite way to return one random record is to order the data by NEWID() (creating a random order for values) and then using TOP 1 to return only the first random record:

Table Driven Animation

While I never have used this technique for animating an image before, I have used a control table to drive what data should get processed in an ETL.

In today’s case, instead of saving the values of what data was last manipulated in an SSIS package, I’ll be storing what each action each frame of animation should display, as well as how much delay to put in between each frame:

WAITFOR DELAY

I wanted there to be a different delay between certain animation frames (I believe the final message reveal deserves a slightly more dramatic pause), so I’m using WAITFOR DELAY to achieve that.

WAITFOR DELAY @DelayToTake;

PRINT

The goal here is to print this ascii 8 ball shaking left and right before displaying the message. We do this using good old fashioned PRINT(). After printing a particular frame we update our control table to indicate that particular frame has been drawn.

GO

You might be wondering why I decided to use a control table to dictate what images to animate. The trouble was that in order to get the PRINT to actually display our ascii images on screen in SSMS, the batch needed to finish submitting. So each frame we print needs to be part of its own batch.

Since we have 7 frames in our animation, we need to execute our procedure 7 times.

Alternatively we can use GO 7, but then we get that ugly batch execution completed message which I don’t think there is anyway to hide:

Completely Useless? I guess not

I’ve always been a fan of occasionally taking a break to build things for pure fun. It’s a good way to apply lesser known features to your code, stretch your creativity for solving problems, and of course push software functionality to their limits through feature abuse.

Here is the full set of code if you want to run it for yourself (note, this works on a 1920×1080 resolution monitor with SSMS at full screen…your results may vary):