Blog

Sometimes I run into a query plan that just shocks me. It’s like looking up and suddenly seeing an adult dressed as a squirrel riding a unicycle down a busy city street. You have to stop and ask yourself, “Did that really just happen?” (I live in Portland, Oregon, so yeah, I’ve seen that.)

Shazam, it finishes using only 63 ms of CPU time. It asks for a 27MB memory grant and estimates 290 rows (quite accurately).

The execution plan contains warnings that “Type conversion in expression … may affect “CardinalityEstimate” in query plan choice”, but wow, it really did a remarkably good job with this!

Using OPTION(QUERYTRACEON 9481) to test Compatibility Level 110

You can go back to the old cardinality estimator with SQL Server 2014 in two ways: You could change the whole database’s compatibility level back to 110 like this:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=110;
GO

But that changes it for everything in the whole database. We might just want to see how THIS query would behave using the old cardinality estimator, but still on SQL Server 2014. You can do that by adding OPTION(QUERYTRACEON 9481) to the very end of our gruesome query:

Retesting the query with the old cardinality estimator… Ouch! The query uses 84,109 ms of CPU time and the execution plan is back to a world of confusion, thinking it’s going to have to handle a kazillion rows:

Hey there new cardinality estimator, I’d like to get to know you better.

I’m not saying the new cardinality estimator will be better at every query, that it won’t have any regressions, or that you should start putting functions around all your joins.

But it’s pretty remarkable when the optimizer takes code that you wrote to be as terrible as possible, and suddenly makes it fast. Sign me up for more of that.

Joe Sack covered the new CE at #SQLSaturday in Madison – quite interesting to say the least. I can’t believe it’s the first major update to the CE engine since the 7.0 days! The one takeaway I got from his presentation was the last sentence on the last slide (as you too alluded to): “If you don’t have time to test, don’t enable the new CE in production until you have.” While the vast majority of performance should improve, there undoubtedly will be some portions of your workload that will benefit from the old CE.

I’m always a fan of testing, but just as a counterpoint: doing a full code regression isn’t a simple task, and there are some applications with a higher tolerance of risk than others.

The fact that you can use trace flags with QUERYTRACEON to control the behavior of certain queries to use the lower compatibility level (or higher) is a great thing, and I think it’ll make the transition less difficult for many people than, say, the transition from compat 80 to higher levels. Man, that wasn’t fun.

I agree – the biggest issue (one of many) was with the use of NOLOCK w/out WITH. Many developers used the old syntax and all heck broke loose when I changed the compatibility level one fine day. At least it was in Dev

In previous versions of the cardinality estimator, SQL Server punted on this estimation and it used some wildly bad guesses about the number of rows being returned out of the function.

In terms of estimating cardinality, a nested scalar function call is no different than a single function call. Mathematically speaking, f(g(x)) is the same as z(x) as long as z(x) does the same thing – nesting function calls in an RDBMS is no different as long as the result of one or more built-ins does the same thing.

The advantage SQL Server 2014 has is that it immediately punts to using hash joins instead of merge and nested loop joins. I’m not sure why SQL Server 2014 uses merge or nested loop joins with the terrible estimates that it makes. (Note that UDFs don’t behave in the same way because they are pure evil.)

There really are two things at play:

1) The Cardinality Estimator is making much more accurate estimates about the rows being returned.
2) The Query Optimizer is making much more interesting guesses about how to deal with these potentially poorly estimated data sets.

With the choice of a hash join, SQL Server can take advantage of a number of

As far as figuring out the trick, I suspect you’d want to start your search by looking for papers coming out of the University of Wisconsin’s computer science department and/or Microsoft Research.

That second paper makes reference to a faster/more improved join model estimation when the CE assumes that all distinct values from the child table are contained in the parent set. It seems to me that SQL Server may have this simple join model optimization built in, but that’s just speculation.

Are UDFs in 2014 still as bad as they use to were? Specially table-valued UDFs (and table variables as well) in which CE is awful and performance degrades as soon as more than just a few rows are returned?
Thanks

The problem with any UDF is that it’s not a physical object – it’s a function. Therefore there is no data to create histograms or statistics on – they are a black box from a cardinality estimation perspective. If you are returning a lot of rows from a UDF and joining to it, etc. – you can safely assume it won’t perform very well.

Adam Machanic demonstrated a really cool way to trick the query optimizer at SQL Saturday 291 in Chicago a couple weekends ago. You can find the scripts in his blog: http://sqlblog.com/blogs/adam_machanic/. I think his method could be used to make the query optimizer consider a more realistic number of rows returned from a UDF.

This is an example where 2014’s CE makes things better, but we’ve got an example where it makes it worse.

We have a statement which works just fine on SQL 2005, 2008, 2008R2 and 2012 (and 2014 with “OPTION (QUERYTRACEON 9481)”) but which grinds to a halt on 2014. i.e. from a second or two to nearly 2 minutes.

Has anybody else experienced a worsening of performance in 2014? We’re considering what to advise customers when they upgrade their servers to 2014.

Oh, absolutely people have had results where it hasn’t been better. For any application, you want to test it carefully before turning it on, and you might want to only enable or disable it for specific queries.

Ha! In my case a simple query (much simpler than the example above) went from 0 seconds to 20 minutes. Because the new estimator seems to think that doing 2000 full table scans on 400K rows is a very fast operation. The estimated operator cost for it went from 9.8 with the old approach to 0.18. And the magic trigger that makes it go crazy seems to be an OR in your WHERE clause. Remove the OR and it regains its senses. Might be a bug. Microsoft rule of thumb seems to be still working: never use a Microsoft product until the first service pack release.