Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspxThere are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, butenCommunityServer 2.1 SP2 (Build: 61129.1)re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42235Sun, 11 Mar 2012 18:28:50 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42235PassedBI<p>GJ! It was interesting to follow your examples. Dangerous subtle nuance, though I never skip GROUP BY clause.</p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42250Mon, 12 Mar 2012 14:05:50 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42250Alejandro Mesa<p>Paul,</p>
<p>Nice reading for a Monday morning :)</p>
<p>Talking about fun, I wonder why the cardinality estimate, out from the filter, changes if we use SUM(1) instead COUNT_BIG(*) in the following query.</p>
<p>SELECT</p>
<p> p.Name</p>
<p>FROM</p>
<p> (</p>
<p> SELECT th.ProductID </p>
<p> FROM Production.TransactionHistory AS th </p>
<p> GROUP BY th.ProductID</p>
<p> HAVING SUM(1) &lt; 10</p>
<p> ) AS q1</p>
<p> JOIN </p>
<p> Production.Product AS p</p>
<p> ON p.ProductID = q1.ProductID;</p>
<p>--</p>
<p>AMB</p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42251Mon, 12 Mar 2012 14:26:02 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42251Paul White<p>PassedBI,</p>
<p>I'm not sure I see what you mean about never skipping the GROUP BY clause. &nbsp;There are certainly times when a scalar aggregate is what is required (counting all the rows in a table for example). &nbsp;There are also times when the scalar semantic is necessary for correct results. &nbsp;It's all about being aware of the differences.</p>
<p>Paul</p>
re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42252Mon, 12 Mar 2012 14:33:47 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42252Paul White<p>Hi AMB,</p>
<p>With COUNT_BIG(*), cardinality estimation calculates an estimate from (derived) statistics. &nbsp;Substituting SUM(1) results in a guess of 132.3 rows (30% of the estimated 441 rows from the aggregate).</p>
<p>Paul</p>
re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42254Mon, 12 Mar 2012 15:29:10 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42254Alejandro Mesa<p>Thanks, Paul!</p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42258Mon, 12 Mar 2012 18:48:02 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42258PassedBI<p>Paul White,</p>
<p>I never skip GROUP BY clause</p>
<p>- when using HAVING condition - should I write.</p>
<p>Just never had the case when need use HAVING without GROUP BY.</p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42277Tue, 13 Mar 2012 11:57:22 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42277Rishabh K<p>Hi Paul,</p>
<p>I really enjoy reading your post. I think that with Sum(1) the cardinality estimate is calculated as All density for ProductID * No of rows after stream aggregation ,i.e, 0.002267574*441~ 1. Its fine &nbsp;but I wonder why there is difference between count_Big and sum with regard to cardinality estimation. I tried with count(1) but still the same estimations</p>
<p>SELECT th.ProductID,COUNT_big(1),SUM(1)</p>
<p>FROM Production.TransactionHistory AS th</p>
<p>GROUP BY th.ProductID</p>
<p>having count_big(1)&lt;10</p>
<p>Thanks for all your great articles </p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42290Tue, 13 Mar 2012 14:01:23 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42290AlexK<p>Paul,</p>
<p>I enjoyed reading your post, but I think HAVING clause is redundant. As such, it should be eliminated - that would simplify and improve SQL. There would be less things to learn, less chances to shoot ourselves in the foot.</p>
<p>What do you think?</p>re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42291Tue, 13 Mar 2012 14:05:58 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42291Paul White<p>Rishabh K,</p>
<p>Thanks. &nbsp;It's the other way around: the cardinality estimate for COUNT is derived from statistics but SUM(1) results in a 30% guess.</p>
<p>Anyway, the reason for the difference is just that cardinality estimation knows how to predict a value for COUNT, but not for SUM(1), so it guesses.</p>
<p>I suppose cardinality estimation could treat the special case of SUM(1) as COUNT (though they produce different results on empty sets) but it doesn't. &nbsp;More generally, SUM(1) is just a special case of SUM(constant) and there's no sensible way to handle statistics given a comparison on that result.</p>
<p>Finally, COUNT(1) and COUNT(*) are the same. &nbsp;When you write COUNT(1) in a query, the query plan will show the aggregate as count star.</p>
<p>Paul</p>
re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42292Tue, 13 Mar 2012 14:15:11 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42292Paul White<p>Hi Alex,</p>
<p>Good to hear from you. &nbsp;I'm starting to think I over-emphasised the HAVING examples :) &nbsp;Forgive me if I am misunderstanding you, but this post is not about HAVING - it is about the subtle semantic differences between aggregate queries with and without GROUP BY:</p>
<p>-- Try with the GROUP BY commented and uncommented</p>
<p>-- 23 versus 86 rows</p>
<p>-- No HAVING in sight!</p>
<p>SELECT</p>
<p> &nbsp; &nbsp;p.Name</p>
<p>FROM Production.Product AS p</p>
<p>WHERE </p>
<p>(</p>
<p> &nbsp; &nbsp;SELECT COUNT_BIG(*) </p>
<p> &nbsp; &nbsp;FROM Production.TransactionHistory AS th</p>
<p> &nbsp; &nbsp;WHERE th.ProductID = p.ProductID</p>
<p> &nbsp; &nbsp;--GROUP BY p.ProductID</p>
<p>) &lt; 10;</p>
<p>Anyway, yes I agree HAVING is pure sugar; we can always replace it with a filter over a derived table. &nbsp;Nevertheless, it is standard SQL and hardly likely to be removed, whatever I think about it.</p>
<p>Paul</p>
re: Fun with Aggregateshttp://sqlblog.com/blogs/paul_white/archive/2012/03/12/fun-with-aggregates.aspx#42383Mon, 19 Mar 2012 13:45:32 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:42383Praveen kumar pddi<p>Hi Paul,</p>
<p>Can you try this variation of solution and provide your valuable inputs &nbsp;by comparing existing solutions?</p>
<p>SELECT p.Name--,rn,cnt </p>
<p>FROM</p>
<p>(</p>
<p>select *</p>
<p>from</p>
<p>(</p>
<p>select *, row_number() over (partition by ProductID, cnt order by ProductID, cnt) as rn</p>
<p>from </p>
<p>(</p>
<p> &nbsp; &nbsp;SELECT </p>
<p> &nbsp; &nbsp; &nbsp; &nbsp;th.ProductID, </p>
<p> &nbsp; &nbsp; &nbsp; &nbsp;cnt = COUNT_BIG(th.ProductID) over (partition by th.ProductID)</p>
<p> &nbsp; &nbsp;FROM Production.TransactionHistory AS th </p>
<p>) AS q1</p>
<p>WHERE</p>
<p> &nbsp; &nbsp;q1.cnt &lt; 10 -- here , we are interested in mere 10 &nbsp;or less</p>
<p>) AS q1</p>
<p>WHERE</p>
<p>--above query returns multiple records for each productID , as we are fetching the data from transactional table </p>
<p>and rn=1</p>
<p>-- rn=1 will fetch one row per each product, in a way, &nbsp;we are removing duplicates</p>
<p>) as q1</p>
<p>JOIN Production.Product AS p </p>
<p> &nbsp; &nbsp;ON p.ProductID = q1.ProductID</p>