SQLServerCentral.com / SQL Server 2008 - General / SQL Server 2008 / Join yeilds different results if used with CTE or a real table / Latest PostsInstantForum.NET v2.9.0SQLServerCentral.comhttp://www.sqlservercentral.com/Forums/notifications@sqlservercentral.comSun, 02 Aug 2015 15:09:06 GMT20RE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspx[quote][b]Evil Kraig F (3/21/2013)[/b][hr][quote][b]Jeff Moden (3/21/2013)[/b][hr]You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".{Edit} Heh... beat out by 2 minutes. :-)[/quote]Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?EDIT: I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.The predicate filter is... Oh god bless rubber ducky coding... of course.Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.[/quote]Correct. The only difference is that a lack of a predicate doesn't make a FULL OUTER JOIN. It makes a CROSS JOIN.Heh... that's one of my favoritie interview questions, BTW. :-PSat, 23 Mar 2013 19:29:44 GMTJeff ModenRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxIt's down to an optimisation introduced in SQL Server2005 often called "Deferred expression evaluation". [url=http://sqlblog.com/blogs/paul_white/archive/2011/02/27/bug-slow-sums-and-averages.aspx]Here's a great article by Paul White [/url]to add to others already posted.The original "failing" query can be modified so that the calculation is written as a row-level CROSS APPLY:[code="sql"]WITH bbb AS ( SELECT TOP 3 x.RandomRow FROM AAA CROSS APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x ORDER BY RowNumber)SELECT a.Col1, bbb.RandomRowFROM AAA AS aJOIN bbb ON a.RowNumber = bbb.RandomRow;[/code]The execution plan is identical to Query No 1:[img]http://s20.postimg.org/oh7dsnhh9/CROSS_APPLY.png[/img]Changing the CROSS APPLY to an OUTER APPLY [code="sql"]WITH bbb AS ( SELECT TOP 3 x.RandomRow FROM AAA OUTER APPLY (SELECT RandomRow = ABS(checksum(newid())) % 6 + 1) x ORDER BY RowNumber)SELECT a.Col1, bbb.RandomRowFROM AAA AS aJOIN bbb ON a.RowNumber = bbb.RandomRow;[/code] is sufficient to prevent the optimiser from implementing deferred expression evaluation and it generates a completely different plan where RandomRow is evaluated prior to the join between the CTE and the table:[img]http://s20.postimg.org/seuni24al/OUTER_APPLY.png[/img]With RandomRow evaluated prior to the join, it's now available to use as a join predicate and the query returns the expected results.Fri, 22 Mar 2013 04:43:35 GMTChrisM@WorkRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxThanks for help.Thu, 21 Mar 2013 19:48:06 GMTitlkRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspx[quote][b]Jeff Moden (3/21/2013)[/b][hr]You're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".{Edit} Heh... beat out by 2 minutes. :-)[/quote]Hey Jeff, I know that no join predicate basically turns it into a FULL OUTER JOIN, but why is the filter failing on its predicate upstream?EDIT: I want to clarify this question. To my understanding there's a few possibilities of how this plan actually results, but I'm not sure which is right and none of them show the logic that my conclusions reach.First, we go out to a table of 3 and 6 rows, and turn it into a FULL OUTER JOIN query, returning that dataset back to the Compute Scalar request. The 18 rows in the actual confirm this as a reasonable pathing.Now, the Compute Scalar occurs for each row (non-deterministic). So, one of two things happen. Either each row gets the exact same NEWID() (which isn't true) or they all get different NEWIDs. In either case, the result is the same... a value between 1 and 6.The predicate filter is... Oh god bless rubber ducky coding... of course.Because each of these rows are getting unique ABS(Checksum results, and they are then compared INTERNALLY to the row, instead of between the rowsets... *headsmack*... of course.Answer: The Join is evaluated long after any join is involved, and the newID result is checked on each row uniquely. Whatever comes out of there is just luck that they matched up.Thu, 21 Mar 2013 18:56:01 GMTEvil Kraig FRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxEDIT: Please ignore this entire post. One dog, wrong tree, right forest.Thu, 21 Mar 2013 18:54:25 GMTEvil Kraig FRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxYou're joining to an indeterminent value and at execution plan time, it can't determine what to join on and so it ignores the predicate. Look at the execution plan for the "Nested Loops (Inner Join)" symbol. Notice the little yellow alert symbol on it. If you right click on the symbol and bring up "Properties", you'll see a warning message that says "No Join Predicate".{Edit} Heh... beat out by 2 minutes. :-)Thu, 21 Mar 2013 18:47:28 GMTJeff ModenRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxInteresting. Looks like you hit a similar problem to this:http://www.glorf.it/blog/2008/05/16/sql-talk/sql-server-is-not-aware-of-nondeterministic-functionsIn query #1, the non-deterministic function in the subquery is applied [i]after[/i] the join (also note the "No Join Predicate" warning in the join operator.[img]http://s9.postimg.org/io72pzz4f/Untitled.png[/img]The non-deterministic subquery has already been materialised in query #2, therefore is guaranteed to be correct.Thu, 21 Mar 2013 18:45:52 GMTMansfieldRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxThanks for answer.I am not concerning about values per se. Sure, the numbers are random, so values will be different each time I run the query.My concern is, that I should get each time exactly 3 rows (I am not voried about the content of rows). With [u]CTE [/u]I get [u]less [/u]or [u]more [/u]rows, each time I run the query.With the [u]real table[/u], the same query returns [u]exactly [/u]3 rows.Why is the difference between CTE and real table?Thanks.Thu, 21 Mar 2013 17:55:50 GMTitlkRE: Join yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspx[quote][b]itlk (3/21/2013)[/b][hr]A/ Why the join yields different results for CTE and for the real table?B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA. Thank you for help.[/quote]A) You're connecting Rownumber to randomRow, which changes EACH time you run the cte, as you generate a newID each run. You'll be very variable there. You need to connect rownumber to rownumber for consistency.B) Right outer join is going to return you the three rows from BBB, sure, with varying connectivity.It all revolves around you using a randomized value in the cte for your join.Thu, 21 Mar 2013 17:30:09 GMTEvil Kraig FJoin yeilds different results if used with CTE or a real tablehttp://www.sqlservercentral.com/Forums/Topic1434084-391-1.aspxHi,Here is the SQL I used as an example:[code="sql"]--CREATE AND POPULATE TABLE AAA ------------------------------------------CREATE TABLE [dbo].[AAA]( [Col1] [nchar](10) NOT NULL, [RowNumber] [int] NOT NULL) ON [PRIMARY]GOINSERT INTO [dbo].[AAA] ([Col1] ,[RowNumber]) VALUES ('a', 1), ('s', 2), ('d', 3), ('e', 4), ('f', 5), ('g', 6)GO--QUERY No 1 ----------------------------------------------------------------;with BBB as( select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, Col1 from AAA order by RowNumber )select A.Col1, A.RowNumberfrom AAA as A join BBB On A.RowNumber = BBB.RandomRow; --QUERY No 2 ----------------------------------------------------------------select top 3 ABS(checksum(newid())) % 6 + 1 as RandomRow, RowNumber, AAA.Col1into BBB from AAAorder by RowNumber; select A.Col1, A.RowNumberfrom AAA as A join BBB On A.RowNumber = BBB.RandomRow; --Cleaning ---------------------------------------------------------------------drop table BBB; --drop table AAA; [/code]I have a table AAA with just two columns, one of which is a row number. I use CTE to create another table “BBB”, which based on the AAA. The BBB table contains the same columns as the AAA table, plus a new column RandomRow.The RandomRow is populated by random numbers, generated from the range 1 to @N, where @N is the number of rows in the table AAA – in this case 6. The table BBB contains 3 rows only: the row numbers will be 1,2,3.The Select statement should return rows from AAA, which are identified by the number in the [BBB].[RandomRows].The Select statement contains the join, which is expected to return exactly 3 rows, because the BBB is subset of AAA. However, each time I run the query the Select returns different numbers of rows. To see the effect, please try to re-run the query No 1 several times, and you should get different numbers of returned rows each time.If I replace the CTE with real table (see the query 2/) I get expected results, exactly 3 rows each time I run the query.My questions are:A/ Why the join yields different results for CTE and for the real table?B/ If I replace the join in the query No 1 with RIGHT OUTER JOIN, the query works fine and returns just 3 rows. However, I think it should work with “join” also, because the BBB is subset of AAA, and number in RandomRows column will always identify the valid row in AAA. Thank you for help.Thu, 21 Mar 2013 17:22:22 GMTitlk