Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

We have a web application that delivers training to thousands of corporate students running on top of SQL Server 2005. Recently, we started seeing that a single specific query in the application went from 1 second to about 30 seconds in terms of execution time. The application started throwing timeouts in that area.

Our first thought was that we may have incorrect indexes, so we reviewed the tables and indexes. However, similar queries elsewhere in the application also run quickly. Reviewing the indexes showed us that they were configured as expected.

We were able to narrow it down to a single query, not a stored procedure. Running this query in SQL Studio also runs quickly. We tried running the application in a different server environment. So a different web server with the same query, parameters and database. The query still ran slow.

The query is a fairly large one related to determining a student's current list of training. It includes joins and left joins on a dozen tables and subqueries. A few of the tables are fairly large (hundreds of thousands of rows) and some of the other tables are small lookup tables. The query uses a grouping clause and a few where conditions. A few of the tables are quite active and the contents change often but the volume of added rows doesn't seem extreme.

These symptoms led us to consider the execution plan. First off, as soon as we reset the execution plan cache with the SQL command 'DBCC FREEPROCCACHE', the problem went away. Unfortunately, the problem started to reoccur within a few days.

The problem has continued to plague us for awhile now. It's usually the same query, but we did appear to see the problem occur in another single query recently. It happens enough to be a nuisance.

We're having a heck of a time trying to fix it since we can't reproduce it in any other environment other than production. I have downloaded the High Availability guide from Red Gate and I read up more on execution plans. I hope to run the profiler on the live server, but I'm a bit concerned about impact. I would like to ask - what is the best way to figure out what is triggering this problem? Has anyone else seen this same issue?

This question came from our site for professional and enthusiast programmers.

The first thing I can think of is that your statistics may not be updated. Can you check on the strategy for statistics on your server?
–
Raj MoreJan 14 '11 at 20:53

@Raj - That wouldn't explain why clearing the procedure cache makes any difference - that won't automatically update the statistics and when the plan gets recompiled it will still be using the same statistics as before. Probably parameter sniffing.
–
Martin SmithJan 14 '11 at 21:05

What a fantastic suggestion. I'll try it right away and mark as answer shortly. I'm a little concerned about how I would interpret the results of the comparison but I'll cross that bridge when I come to it.
–
Tony BrandnerJan 17 '11 at 17:29

That helped a lot. I was able to view the execution plan when it was timing out. Then I ran it in SQL Studio and grabbed that execution plan. I saved both execution plans (they were in XML) with the file extension *.sqlplan and then was able to review them in a graphical way.
–
Tony BrandnerJan 18 '11 at 17:36