You have a stored procedure that runs well most of the time but sometimes it’s WAYYYYY off. It’s almost as though the performance of it went from great to horrible in a split second (like falling off of a cliff). You don’t know why but someone says – it’s got to be the statistics. In fact, if you have the luxury of time (which most folks don’t have), you execute it yourself and you check the plan – WOW, the estimated number of rows is WAY off from the actual rows. OK, it’s confirmed (you think); it’s statistics.

But, maybe it’s not…

See, a stored procedure, a parameterized statement executed with sp_executesql and prepared statements submitted by clients ALL reuse cached plans. These plans were defined using something called parameter sniffing. Parameter sniffing is not a problem itself – but, it can become a problem for later executions of that same statement/procedure. If a plan for one of these statements was created for parameters that only return 1 row then the plan might be simple and straightforward – use a nonclustered index and then do a bookmark lookup (that’s about as simple as it can get). But, if that same sp_execute statement/procedure/prepared statement runs again later with a parameter that returns thousands of rows then using the same plan created by sniffing that earlier parameter then it might not be good. And, this might be a rare execution. OR, it could be even more strange. These plans are not stored on disk; they are not permanent objects. They are created any time there is not already a plan in the cache. So, there are a variety of reasons why these plans can fall out of cache. And, if it just so happens that an atypical set of parameters are the first ones used after the plan has fallen out of cache (better described as “has been invalidated”) then a very poor plan could end up in cache and cause subsequent executions of typical parameters to be way off. Again, if you look at the actual plan you’ll probably see that the estimate is WAY off from the actual. But, it’s NOT likely to be a statistics problem.

But, let’s say that you think it is a statistics problem. What do you do?

You UPDATE STATISTICS tablename or you UPDATE STATISTICS tablename indexname (for an index that you specifically suspect to be out of date)

And, then you execute the procedure again and yep, it runs correctly this time. So, you think, yes, it must have been the statistics!

However, what you may have seen is a side-effect of having updated statistics. When you update statistics, SQL Server usually* does plan invalidation. Therefore, the plan that was in cache was invalidated. When you executed again, you got a new plan. This new plan used parameter sniffing to see the parameters you used and then it came up with a more appropriate plan. So, it probably wasn’t the statistics – it was the plan all along.

So, what can you do?

First, do not use update statistics as your first response. If you have a procedure that’s causing you grief you should consider recompiling it to see if you can get a better plan. How? You want to use sp_recompile procedurename. This will cause any plan in cache to be invalidated. This is a quick and simple operation. And, it will tell you whether or not you have a recompilation problem (and not a statistics problem). If you get a good plan then what you know is that your stored procedure might need some “tweaking” to its code. I’ve outlined a few things that you can use to help you here: Stored procedures, recompilation and .NetRocks. If that doesn’t work, then you MIGHT need to update statistics. What you should really do first though is make sure that the compiled value of the code IS the same as the execution value of the code. If you use “show actual plan” you can see this by checking the properties window (F4) and hovering over the output/select.

This will confirm that the execution did (or did not) use those values to compile the plan. If they were the correct values then you might have a statistics problem. But, it’s often blamed and it’s not actually the problem. It’s the plan.

So, what’s the problem? Ironically, I kind of like this. I think that statistics has been blamed all too often for statement/plan problems when it’s not the statistics, it’s the plan. So, I like that there will be fewer false positives. But, at the same time, if I update statistics off hours, I DEFINITELY want SQL Server to invalidate plans and re-sniff my parameter (especially if the data HAS changed) and possibly get new plans from my updated stats.

If the plan should NOT be invalidated (directly due to statistics because the data has NOT changed) then it won’t. But…
If the plan should be evaluated (statistics have been updated AND data changed) then it will.

The key point is “data changed.” An update statistics ALONE will not cause plan invalidation (which is STILL different behavior from 2005/2008/2008R2) but it’s the best of both worlds IMO. Only if at least ONE row has been modified then the UPDATE STATISTICS *WILL* cause plan invalidation.

UPDATE 2: The key point is that there might still be some false positives and I’d still rather than people try sp_recompile first but it’s good that UPDATE STATISTICS will cause plan invalidation. But, it’s still a tad different than prior versions… interesting for sure.

A simple workaround is to use sp_recompile tablename at the end of your maintenance script but be aware that running an sp_recompile against a TABLE requires a schema modification lock (SCH_M). As a result, this can cause blocking. If you don’t have any long running reports (or long running transactions) at that time though, it should be quick and simple.

And, stay tuned on this one. In a later CU you should be able to remove the sp_recompile AND you won’t need to worry about the database option either (yeah!).

32 Responses to What caused that plan to go horribly wrong – should you update statistics?

Very timely post. A friend is having a similar problem also fixed by updating stats. We too suspect it is being "fixed" by the sideaffect of dropping query plans. Fixing the code (reparameterizing) is best, but not always a quick turn-around from the vendor.

I’m not as big of a fan of plan guides. I do really like plan guide templates but those are more for statements rather than procedures and a procedure (even if it has EXACTLY that statement in it) will not use a plan guide template, it will use sniffing.

Having said that, if you know that there’s ONE plan that you do want to force (similar to using OPTION (OPTIMIZE FOR…)) then yes, you could use a guide. Ideally, you’ll be able to change the code (or, let the software vendor know that they might want to change the code itself to handle the skew/volatility).

Hi Kimberly,
Great article, couple of years ago parameter sniffing drove me nuts, I had this issue on both my production and dev server, same query was showing three different execution times, if I had that as adhoc or parameterized it would run under 8 seconds, but once I put than in SP performance it would take more than 25 seconds, I did "with recompile" , "optimize for" and even "optimize for unknown", nothing worked. So on dev server I went commando with it, did every thing DBCC FreeProcCache, DropCleanBuffers, updated stats, rebuild indexes and so on and still ad hoc query was faster, followed by param-ed, with SP as distant third.
So what do you think was happening? did server had memory pressure? and SP and param-ed query were suffering cause plan was dropped "before they could even use it???" (I dont think that can happen though, cause once created a plan is used right away), thankfully Paul and Jonathan came to rescue, did server audit on production and pointed to different configuration changes for overall optimizations, I applied same on development and staging as well, and this problem went away.

and this link is not working "Stored procedures, recompilation and .NetRocks"

For the typo to Erin’s blog… got that! Whoops! Would have been easier to remember if it have been on SQLskills. But, alas, she wasn’t working us at that point (YET! :)).

And, Rizwan – really hard to say after the fact. But, part of what you said doesn’t really make sense to me – you said adhoc OR parameterized and they gabe the same results. What do you mean (they should not have given the same results):
Adhoc (do you mean just a statement OR a statement executed with EXEC(@string))
Parameterized (do you mean a statement with parameters in a proc OR a statementexec executed with sp_executesql)

Adhoc and EXEC(@String) will have to be analyzed at run time. These should (very likely) generate a new plan for each execution because they are NOT likely to become parameterized statements (because they are unlikely to be deemed safe). And the Parameterized statements will be sniffed and a plan will be stored on the first execution – with the parameters stated.

Then, when in the sp it took 25 seconds. Was that on first executions or subsequent. Did you try a specific procedure with EXEC proc WITH RECOMPILE?

And, possibly, did that stored procedure have obfuscation with variables? That could have significantly changed the plan from sniffing to using the density vector.

My only thought is that you were getting a sniffed plan for everything but the SP and the SP had the parameter obfuscated by a variable?

But, again, really hard to say without being able to see the code. My guess is the variable (and getting a plan tied to the average same as optimize for unknown).

Finally, I’m glad that Paul/Jon were able to help. Would be nice to go back and see the code?!

I love the timing of this post. I was troubleshooting a SharePoint performance issue (Index Crawler) over the weekend. The statistics sampling level was extremely low so I updated with the expectation of an automatic recompile. Nope: SharePoint defaults auto_update_statistics = off. sp_recompile on the referenced tables didn’t work either. I think it’s a side-effect of "auto_update_statistics = off" but I haven’t verified.

If a query optimization returns with a bad plan due to an estimation failure based on "parameter sniffing", would it help at all if we would add "optimize for" hints in the code to help the optimization engine to make a more informed choice?

Good article by the way, and fortunately for me exactly when i’m experiencing this problem :)

Hey there SQLJunkie@Sanbeiji – Yes, it’s very likely a side effect of having auto_update_statistics turned off. That’s the behavior that SQL has in versions prior to 2012. So, while I know that a lot of people recommend turning this off for Sharepoint – I don’t. I strongly recommend that my customers leave BOTH auto create and auto update ON. It does mean that you have to know a bit more about these effects but these effects are generally more desirable ESPECIALLY when combined with regular and automated maintenance (which should be the PRIMARY way that stats get updated but just in case…).

To Theo – Yes, I’m a strong believer in writing code that will behave based on my knowledge of the data and the optimizer. I just recently wrote a post that will help you though. I don’t want you to use OPTION RECOMPILE everywhere but there are cases where it’s definitely warranted!

This is a really great post.
Many thanks’ for your work. I agree with you: we shouldn’t make confusion between statistics problem and parameter sniffing (which is in fact not a real problem, I though it’s a natural effect of plan cache instead….).
I often encounter statistics problem, and identify it by an Estimated Number of rows = 1 (magic number again!) on an awful nested loop (awful because it runs on hundreds of thousand rows). Parameter sniffing never gave me an estimated number of rows = 1.
I have solved my statistics problems by using trace flag 2371 on one case, and most often trace flag 2389 and / or 2390. But I still have a parameter sniffing case I can’t kick off : my stored procedure is called hundreds of time per seconds (recompile takes to much time and to much CPU) and data are not distributed in a regular way (can’t use an Optimized For query hint). I read carefully your "Stored procedures, recompilation and .NetRocks" article. I didn’t’ know about the OPTION (OPTIMIZE FOR UNKNOWN) so many thanks’ for this information I’m going to try now. (By the way, neither the sp_executesql help).

We have a production system that we have set asynchronous statistics updates ON, so that none of our incoming queries have to wait when auto stat updates occurs, we are updating stats manually looking at the value rowmodctr.

Given that you’re manually updating stats, it’s less likely that the threshold will be reached between these manual updates. However, it really depends on how frequently you’re updating them and how volatile your data is. Having async on just means that the first query that hits the invalidated plan will still use it but will trigger the update to run (just not waiting for the results of it). If the threshold is not being reaching between your manual updates then auto updating (with or without async) won’t ever really happen.

We are having a similar problem. However, even when we clear all cached plans the optimizer still gives the wrong plan. If we update stats, it returns the correct plan. This table has about 16 million rows and we refresh about 3% of it every day. Does it make sense to automate the stats update every day.

Hey there Ali – How are you clearing cached plans? And, how frequently is this being executed? Have you tried sp_recompile procname BEFORE updating statistics? I’d *always* try sp_recompile on the procedure name before updating stats. The reason that updating stats MIGHT work is because it does Plan Invalidation. But, if/when that doesn’t work (which might really be your case) and it really is a stats problem – then yes, updating stats might be necessary. Ideally, doing this off hours (and, yes, automating this process) can be ideal!

Thanks for the quick reply! This is definitely a weird one to me :). It is actually not a stored proc. It’s a query called from a .NET application with one parameter. It is called once a day and for the past few days we have been timing out on the query. When I look at the cached plans using sys.dm_exec_query_plan, I see the bad plan. Basically it is using an index seek on a column in the group by clause which is not supposed to. The strange part is that when we run it interactively using a sql parameter it gives the correct plan but when we run it interactively hard coding the parameter it uses the bad plan which is the same one used when the .Net code calls it. I deleted the cashed plan using FREEPROCCACHE but we still get the wrong plan when running it either from the app or hard coding the parameter. I even tried Option (recompile) but no luck. As soon as I update stats or disable the index which it should not use it is fine. I guess one option would be to drop that index.

Can you show me the exact way the app is submitting it? Is it sp_executesql (by any chance) or, is it completely adhoc? And, can you do a DBCC FREESYSTEMCACHE (‘SQL Plans’) instead and see if that changes things (which would be weird that FREEPROCCACHE didn’t get it). Definitely sounds like a plan problem.

Is there any chance that other people are executing this in the same environment – while you’re troubleshooting it? And, again, most important is HOW you’re executing it!

SELECT SUM(RealCurrencyAmountUSD) AS LTV, DimGlobalUserID
FROM fact.WalletFunding
WHERE
ETLRunID @ETLRunID AND
DimGlobalUserID IN
(SELECT DimGlobalUserID FROM Fact.WalletFunding
WHERE ETLRunID = @ETLRunID)
GROUP BY DimGlobalUserID
HAVING SUM(RealCurrencyAmountUSD) > 0;”;

I have noticed that in SQL Server 2008 the SP is not marked for recompile after the stats on a table are updated unless the SP is a complex one. So any SP whose XML plan shows that optimizer treated it as TRIVIAL will not be recompiled after updating the stats.

I am running update statistics with 30% sample on 1.5 TB table and there are 5 indexes on table. It is taking long time approx 24 hrs over and still running.. IS there any possibility to get the status like how much % completed?

Unfortunately not all operations in SQL Server write their execution stats to the percent_complete column in sys.dm_exec_requests. So, no. Unfortunately no good way to know that except by studying past executions. And, even that’s not guaranteed. But, at least it can help as an estimate.

Thanks for your blog,it’s help me a lot!!
Question :
DB : SQL Server 2005 Ent with SP2
We update statistic on some tables in the last friday of every month.
And run a query in the first day of every month,the updated table still have table scan. Could be the execution plan invaild? Why? it seem like a bug?
And we restore database to another server to test,it’s not have table scan.Why? If because “restore” and “attach” to clean the execution plan cache?
Sorry for my pool english and thanks!!

Check the database option for “auto update statistics” when this option is OFF then manual updates of statistics do NOT cause recompilations (and therefore changes) to plans. The reason why it does get a new plan on a restore or an attach is that the entire plan cache for that database is cleared when a database is restored / dropped/created / attached.

It could be that someone else beat you to it and their execution required the other plan (which is bad for you but not bad for them). This is known as parameter sniffing problems. I suspect you’re running into that in your environment.

Hi Kimberly,
I was surprised to see the new behavior in SQL 2012+ regarding plan invalidation when statistics are updated.

If there are no data changes in the table, a simple UPDATE STATISTICS itself should not invalidate cached plans, this is clear to me and it make sense.
But.. what if data changes occurred ?

I tested the following situation
- A simple query SELECT COUNT(*) from Tbl1 WHERE Name = ‘John’ is cached
- Table contains 4000 rows with name=’John’
- Plan shows that Actual and Estimated are almost equals
- 5000 new rows are inserted in the table
- manually run UPDATE STATISTICS against the heap

After the UPDATE STATISTICS, statistics was updated (I can see new values in histogram using sp_helpstats) but the plan is still in cache.
Estimated rows Vs Actual rows show that plan was not invalidated.
If I run DBCC FREEPROCCACHE the plan is recreated and Estimated Vs Actual reflects the new situation

I can’t repro your scenario on 2014 (I tried with both a clustered and a heap). I modified enough data to warrant changing the plan but less than the auto update threshold and I was able to see a plan change after updating the statistics manually in both cases (which is the behavior I’ve also seen in 2012).

If you can create a script that repros this – please post it. Otherwise, I agree with you – that doesn’t make sense at all! There must be something else to it? Are you sure this was 2012? If it was 2008 (or R2) then you’d also have to check your database setting for “auto update statistics.” In 2008 (or R2) statistics are NOT updated when you manually update statistics IF the database setting for auto update statistics is OFF. This is the only thing I can think of? Hmm… also check the database’s compatibility mode (sp_dbcmptlevel).

[…] And, this is how it goes. Poor performance occurs and often a variety of things are blamed (cache or I/O or statistics or indexes) and some quick fix (updating statistics is another common one) SEEMS to solve the problem. But, ultimately, it’s probably a side-effect of what you’ve done that’s solved the problem (NOTE: Updating statistics often “invalidates” plans so the side-effect of having updated statistics might be what solved your problem and NOT the updating of statistics itself). I talked about this in a related post: What caused that plan to go horribly wrong – should you update statistics? […]