Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Now this isn't as much a question of how to fix this - I can either use the OPTION (HASH JOIN) or create a temp table manually. I'm more wondering why the query optimizer would ever use the plan it currently does.

Since the QO doesn't have any stats on the BrowserID column, I'm guessing it's assuming the worst - 50 million distinct values, thus requiring quite a large in-memory/tempdb worktable. As such, the safest way is to perform scans for each row in tblFEStatsBrowsers. There is no foreign key relationship between the BrowserID columns in the two tables, so the QO can't deduct any info from tblFEStatsBrowsers.

Update 2
Excellent answers, all of you - thanks! Tough to pick just one. Though Martin was first and Remus provides an excellent solution, I have to give it to the Kiwi for going mental on the details :)

Did you try removing your DISTINCT from the first query? I'm not sure if it is optimized out or not since it's unneeded.
–
JNK♦May 31 '12 at 14:52

Yep, I tried removing the distinct, as well as the NOLOCK hint as well as the predicate - just to see if it made a difference. It did not.
–
Mark S. RasmussenMay 31 '12 at 14:54

I think there should still be some stats on that column. Have you done an sp_updatestatistics on your hits table?
–
JNK♦May 31 '12 at 14:55

You're right, there are unindexed column statistics. The density of the BrowserID column is 0,33 and there are 163 steps each having a large amount of rows - thus making the choice of a nested loop join even more odd. Basically the QO should be able to recognize that there are few distinct values of BrowserID, making a temporary worktable way more efficient, given it knows the row counts of each table.
–
Mark S. RasmussenMay 31 '12 at 15:37

The estimated cost is now 0.00452 units. The addition of the Top physical operator sets a row goal of 1 row at the Top operator. The question then becomes how to derive a 'row goal' for the Clustered Index Scan; that is, how many rows should the scan expect to process before one row matches the BrowserID predicate?

The statistical information available shows 166 distinct BrowserID values (1/[All Density] = 1/0.006024096 = 166). Costing assumes that the distinct values are distributed uniformly over the physical rows, so the row goal on the Clustered Index Scan is set to 166.302 (accounting for the change in table cardinality since the sampled statistics were gathered).

The estimated cost of scanning the expected 166 rows is not very large (even executed 339 times, once for each change of BrowserID) - the Clustered Index Scan shows an estimated cost of 1.3219 units, showing the scaling effect of the row goal. The unscaled operator costs for I/O and CPU are shown as 153.931, and 52.8698 respectively:

In practice, it is very unlikely that the first 166 rows scanned from the index (in whatever order they happen to be returned) will contain one each of the possible BrowserID values. Nevertheless, the DELETE plan is costed at 1.40921 units total, and is selected by the optimizer for that reason. Bart Duncan shows another example of this type in a recent post titled Row Goals Gone Rogue.

It is also interesting to note that the Top operator in the execution plan is not associated with the Anti Semi Join (in particular the 'short-circuiting' Martin mentions). We can start to see where the Top comes from by first disabling an exploration rule called GbAggToConstScanOrTop:

That plan has an estimated cost of 364.912, and shows that the Top replaced a Group By Aggregate (grouping by the correlated column BrowserID). The aggregate is not due to the redundant DISTINCT in the query text: it is an optimization that can be introduced by two exploration rules, LASJNtoLASJNonDist and LASJOnLclDist. Disabling those two as well produces this plan:

Another way to 'fix' the original query would be to create the missing index on BrowserID that the execution plan reports. Nested loops work best with when the inner side is indexed. Estimating cardinality for semi joins is challenging at the best of times. Not having proper indexing (the large table doesn't even have a unique key!) will not help at all.

I bow to thee, you just introduced me to several new concepts I've never encountered before. Just when you feel you know something, someone out there will put you down - in a good way :) Adding the index would definitely help. However, besides this one-time operation, the field is never accessed/aggregated by the BrowserID column and so I'd rather save those bytes as the table is quite large (this is just one of many identical databases). There is no unique key on the table as there is no natural uniqueness to it. All selects are by PaperID and optionally a period.
–
Mark S. RasmussenJun 3 '12 at 12:57

When I run your script to create a statistics only database and the query in the question I get the following plan.

The Table Cardinalities shown in the plan are

tblFEStatsPaperHits: 48063400

tblFEStatsBrowsers : 339

So it estimates that it will need to perform the scan on tblFEStatsPaperHits 339 times. Each scan has the correlated predicate tblFEStatsBrowsers.BrowserID=tblFEStatsPaperHits.BrowserID AND tblFEStatsPaperHits.BrowserID IS NOT NULL that is pushed down into the scan operator.

The plan doesn't mean that there will be 339 full scans however. As it is under an anti semi join operator as soon as the first matching row on each scan is found it can short circuit the rest of it. The estimated subtree cost for this node is 1.32603 and the entire plan is costed at 1.41337.

For the Hash Join it gives the plan below

The overall plan is costed at 418.415 (about 300 times more expensive than the nested loops plan) with the single full clustered index scan on tblFEStatsPaperHits costed at 206.8 alone. Compare this with the 1.32603 estimate for 339 partial scans given earlier (Average partial scan estimated cost = 0.003911592).

So this would indicate that it is costing each partial scan as being 53,000 times less expensive than a full scan. If the costings were to scale linearly with row count then that would mean that it is assuming that on average it would only need to process 900 rows on each iteration before it finds a matching row and can short circuit.

I don't think the costings do scale in that linear way however. I think they also incorporate some element of fixed startup cost. Trying various values of TOP in the following query

SELECT TOP 147 BrowserID
FROM [dbo].[tblFEStatsPaperHits]

147 gives the closest estimated subtree cost to 0.003911592 at 0.0039113. Either way it is clear that it is basing the costing on the assumption that each scan will only have to process a tiny proportion of the table, in the order of hundreds of rows rather than millions.

I'm not sure exactly what maths it bases this assumption on and it doesn't really add up with the row count estimates in the rest of the plan (The 236 estimated rows coming out of the nested loops join would imply that there were 236 cases where no matching row was found at all and a full scan was required). I assume this is just a case where the modelling assumptions made fall down somewhat and leave the nested loops plan significantly under costed.

Great work Martin. Does this make sense for the missing piece of the puzzle? tblFEStatsPaperHits has 47976215 rows, average density for BrowserId is 0.006024096 which yields 289013 rows per BrowserId. tblFEStatsBrowsers has 339 distinct BrowserId values. If the optimiser were to assume an even distribution across tblFEStatsPaperHits... 289013 / 339 = 852.55 i.e. an expectation of a short circuit occurring after 853 rows.
–
Mark Storey-SmithJun 2 '12 at 21:01

@MarkStorey-Smith - I'm not sure how valid my 900 row figure actually is as the plan doesn't expose anywhere how many rows it expects the storage engine to process so I had to get it by the calculation in my answer. I think there might be some fixed cost per operator it adds on as well rather than scaling linearly with row count. From playing around with various numbers here SELECT TOP 147 BrowserID FROM [dbo].[tblFEStatsPaperHits] gives the closest subtree cost to 0.003911592 at 0.0039113
–
Martin SmithJun 2 '12 at 21:24

This gives you a materialized index one row per BrowserID, eliminating the need to scan 50M rows. The engine will maintain it for you and the QO will use it 'as-is' in the statement you posted (w/o any hint or query rewrite).

The downside is of course contention. Any insert or delete operation in tblFEStatsPaperHits (and I guess is a logging table with heavy inserts) will have to serialize access to a given BrowserID. There are ways that make this workable (delayed updates, 2 staged logging etc) if you're willing to buy into it.

+1 though the "w/o any hint or query rewrite" bit does require an Enterprise SKU (and an optimizer in co-operative mood).
–
Paul WhiteJun 3 '12 at 10:41

@SQL Kiwi. True. When in doubt (or on less than EE), select straight from the view with NOEXPAND :)
–
Remus RusanuJun 3 '12 at 11:13

I hear you, any scan that large is definitely generally unacceptable. In this case it's for some one-time data cleanup operations so I'm opting not to create additional indexes (and can't do so temporarily as it'd interrupt the system). I don't have EE but given that this is one-time, hints would be okay. My main curiosity was on how the QO got up with the plan though :) The table is a logging table and there are heavy inserts. There is a separate asynchronous logging table though that later updates the rows in tblFEStatsPaperHits so I could manage it myself, if necessary.
–
Mark S. RasmussenJun 3 '12 at 12:50