There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects of a query usually for performance reasons.

The first question that I have to ask when looking at requests like that is “Why?”

Why is removing a particular query operator the goal? Why is changing a where clause predicate the goal? If it’s to make the query faster, has the query been examined and has it been confirmed that query operator or predicate really is the problem?

The title of this post refers to a comment I’ve seen again and again in blogs or articles about front-end development. “Premature optimisation is the root of all evils.” It’s true in the database field as well.

While optimisation is very important in database development, trying to optimise queries without any idea where the problem with the query is, or even if the query is a problem at all is about as effective in fixing a database performance problem as using a shotgun from 100 meters is in killing mosquitoes. If you hit the problem, it’s by shear luck and nothing else.

There’s two sides to this problem.

The first aspect of this is, during development, spending time on optimising a query (or stored procedure) without any idea whether or not the query is inefficient and no idea whether or not the changes made make any improvement or not.

Firstly this is a waste of time that could be better spent developing other queries. Second it creates an incorrect impression that the queries have been optimised when in fact nothing of the sort has been done.

The second aspect when, with a production database that is performing badly, queries are modified almost at random in an attempt to fix the performance problem quickly.

This almost never works. It wastes time fixing stuff that very likely isn’t broken in the first place all the while the database performance deteriorates and management curses SQL Server as ‘nonscalable’

So, what is the right approach for the above two scenarios?

Don’t optimise queries without knowing if they need it.

Don’t optimise queries without knowing if they need it. 1

New development

When writing queries and stored procedures they need to be tested against a representative data set on a server with representative workload and their performance characteristics evaluated to see if they are acceptable. If the query’s performance characteristics are acceptable, then that query requires no optimisation2

This doesn’t mean write bad code and push it to production. It means write good, solid code, following accepted coding standards, ensure that it runs acceptably against production-volumes of data, and do not spend hours or days trying to get it running a couple of milliseconds faster.

And if the query doesn’t perform acceptable, identify the problematic portion and fix that, don’t flail around rewriting bits of the query in the hope that the problem will magically go away.

The execution plan is the primary tool here, along with the output of Statistics IO.

Fixing existing code

When evaluating existing databases with know performance problems, limit the performance tuning to queries that really are performing badly and need optimisation. It’s often true that fixing the top 5-10 worst performing queries will have massive effects in overall system performance, far more than tuning twice that number of queries that aren’t really a problem.

The best tool for finding which queries really are the worst offenders is SQL Trace.

When looking at queries that are a problem, identify the portions that are inefficient and target attempts at optimisation towards those problems.

In conclusion

Measure Twice.
Optimise if necessary.

(1) No, that wasn’t a typo.

(2) At that time. Later changes to schema or data volume may require existing queries to be revised.

I’m sorry, I totally disagree with your comments about acceptable performance. As a practicing and long time database architect, when I do NOT tune the production code, it inevitably comes back to bite me in the ass. I just got done tuning some ‘production’ code that was missing about 30 critical indexes. Had this code been tuned properly to begin with, I would NOT HAVE HAD THIS ISSUE. ‘Good performance’ isn’t good enough.

I never said don’t tune. I said that if the query performs acceptably under production load and production volumes then don’t waste time tuning that, focus on other code which is not performing acceptably under production volumes and production load. Don’t obsess over a couple ms, focus time and resources on real problems.

30 missing indexes means performance was never good enough. 2 missing indexes that turn a 20ms query into a 15 ms query is what I was talking about. That was a waste of time. Time that could far better have been spend turning a 15 second query into a 50 ms query.

“Good performance” is by definition good enough. If it’s not good enough then it’s “Unacceptable performance” not “Good performance”