Enjoy Another Sandwich -- Kent Tegels : Common Table Expressionshttp://sqlblog.com/blogs/kent_tegels/archive/tags/Common+Table+Expressions/default.aspxTags: Common Table ExpressionsenCommunityServer 2.1 SP2 (Build: 61129.1)T-SQL statements with multiple Common Table Expressions: Yes, we can.http://sqlblog.com/blogs/kent_tegels/archive/2009/05/06/13818.aspxWed, 06 May 2009 15:53:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:13818ktegels10http://sqlblog.com/blogs/kent_tegels/comments/13818.aspxhttp://sqlblog.com/blogs/kent_tegels/commentrss.aspx?PostID=13818<P>Yesterday I was mulling over the SQL Server tweets and read one by Michelle Ufford (aka SqlFool). She was answering another developer's questions about the use of Common Table Expression in which she asked if it was possible to have multiple CTEs in a single query.</P>
<P>Yes and No.</P>
<P>No in the sense that you cannot nest CTEs. That is, a CTE cannot be written such that its results into another CTE which in turn feeds the query in question.</P>
<P>Yes in the sense that you can define multiple CTEs as "peers." That is, you can have as many CTEs as you like -- each separated by a comma -- which feed a query.</P>
<P>This is more than just syntax and semantic sugar. Remember than one of the use cases for a CTE is to write an executable query where you can re-use a resolved result set over and over again without rewriting its query each time as sub-queries. There may be cases where you need to do that with more than one feeding query too.</P>
<P>Here's an example. Sure, there's other ways to write this, but it demonstrates the use of peered CTEs. Let us say we have table that represents students scores on an exam. Each time the class runs, a new generation of the test is issued. Ideally, the student scores on the test should be statistically similar for each class assuming a normal distribution of students. You decide to see if that's really the case. You decide that you would like to know how many students scores from the third test generation were between the average score less one standard deviation from the first generation and the average score plus one standard deviation from the second .
<P>Let us start the query by generating some data in a table variable:</P><CODE>create table #t(id int not null primary key clustered,score decimal(5,2) not null,generation tinyint not null);<BR>declare @score float=1.0;<BR>declare @group tinyint = 0;<BR>declare @index smallint = 1000;<BR>while @index &gt; 0 begin<BR>set @score = 100*RAND();<BR>set @group = RAND()*5+1;<BR>insert into #t values (1000-@index,@score,@group);<BR>set @index -= 1;<BR>end;</CODE>
<P>You've heard about CTEs and wonder "Might they might help in this case" That really is an interesting question. Here is one way to solve the problem using CTEs:</P><CODE>-- find the means and S for the first two generations of data<BR>-- (shows using peer CTEs)<BR>-- on my test machine, total sub-tree cost is 0.02304924<BR>with g0(a,s) as (<BR>select AVG(score),STDEVP(score) from #t where generation=1<BR>), g1(a,s) as (<BR>select AVG(score),STDEVP(score) from #t where generation=2<BR>)<BR>-- find the number of criteria matching records<BR>select COUNT(score),avg(score),stdevp(score) from #t,g0,g1<BR>where generation &gt; 1<BR>group by g0.a,g0.s,g1.a,g1.s<BR>having avg(score) between g0.a-g0.s and g1.a+g1.s;</CODE>
<P>And another using sub-queries:</P><CODE>-- with subsqueries instead<BR>-- on my test machine, total sub-tree cost is 0.0197593<BR>select COUNT(score),avg(score),stdevp(score) from #t<BR>where generation &gt; 1<BR>having AVG(score) between (<BR>select AVG(score)-STDEVP(score) from #t where generation=1<BR>) and (<BR>select AVG(score)+STDEVP(score) from #t where generation=2);</CODE>
<P>So here's where this gets interesting. Which of these two queries has a lower query cost and why? If the hairs on the back of your neck started rising with the CTE-based query after reading the where clause... GOOD, they should have. What we have done there is creating a Cartesian product. Most DBAs and Developers would cringe at that. After all, aren't you bringing as many rows into scope as the product of the sizes of the base table and the two result sets from the CTEs? Yes, you are. But even so, 1000x1x1 is still just one thousand. The problem is that the query engine does not really grok that. If you execute the two plans sequentially with "show actual plan" turned on, you'll see that the CTE-driven plan has a considerably higher cost that the sub-query plan. Why? You have probably already guessed! It is the expense of doing the Cartesian product.</P>
<P>The actual plan shows us this nicely:</P>
<P><A href="http://lh3.ggpht.com/_eKmnrAqsp-E/SgGubL6ZAEI/AAAAAAAAG1k/c1qlmYcZlvg/s1600/qp_cte.png" target=_BLANK>Click here see picture. For some reason, inserting the IMG tag here causes IE to freak out. Yuck!</A></P>
<P><SUP>1</SUP>: Yes, yes, I know, I know. Statistically this is not really meaningful. You should be using Student's T-Test comparing each group individually. It is just an example used here to help people understand the peer CTE concept. Before sending me a whiney email, statboy, just keep that in mind.</P><img src="http://sqlblog.com/aggbug.aspx?PostID=13818" width="1" height="1">Common Table ExpressionsCTEDevelopMentorTwitter