paulsasik (3/18/2008)The query provided by Tao is fine and was the first solution i thought of when i read the example problem. However, the CTE example will effortlessly resolve the issue of two or more departments having the same number of employees:

If two departments have n employees and are tied for having the most in the company, both should be displayed. Tao's use of the TOP 1 clause in this case will choose one of the departments arbitrarily for display and produce a less-than desirable result. ...

hi, hi don't see the advantage of CTE to temp Tables like in the example below. Is there any advantage?

WITH CountEmployees(dept_id, n) AS ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ), DeptPayroll( dept_id, amt ) AS ( SELECT dept_id, sum(salary) AS amt FROM employee GROUP BY dept_id )

select v.* into #CountEmployees from(SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id)vselect v.* into #DeptPayroll from(SELECT dept_id, sum(salary) AS amt FROM employee GROUP BY dept_id)v

To me, CTE's have the advantage of Derived Tables because you can reference them many times in the same query at different levels in the query whereas a Derived table can only be referenced one level up in the same query. It's not a huge advantage until you actually need it. CTE's also (to me, anyway) present themselves nicer for readability and documentation purposes. They can also call themselves (recursion) like a function might.

Temporary tables have the advantage over both because they persist for longer than just one query. I used (and still use) temp tables in place of CTEs ("sub-query refactoring" in Oracle) long before they were available in MSSQL. In fact, if I know a result set must be used across many queries in a sproc, I'll still use TempTables instead of CTE's. I like "local" temp tables a lot... I wish Oracle understood the concept as well. Their "global" temp tables don't work out quite so well.

--Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T."--22 Aug 2013

The clarity thing is totally a personal preference. To me, being able to look at the query at the end, especially in things like complex updates, and have that query be only a couple of lines, is more clear than even a well-formatted derived table. But that's for me. I can read either one, I just find the CTE easier. And, as you mention, that's mainly because I've been using them for a while now.

The other point, using them for consistency, is where I think it really matters. If you use them at all, use them as exclusively as possible. Same as any other standard.

fenomenales:

In the example you give, no, there isn't a clear advantage that I'm aware of for the CTE over the temp table.

As with derived tables, the real difference is when you want one that's recursive (like a hierarchy or bill of materials).

Temp tables, like CTEs and unlike derived tables, can be referenced more than once in a query. Temp tables also have the advantage that they can be referenced in more than one query, unlike both CTEs and derived tables. That's significant in many cases.

CTEs, like temp tables, table variables, and derived tables, all reside in memory till they get too big, then they get dumped into the tempdb. (I've tested all of these and they do. You should have seen the 200 Gig tempdb I ended up with from one runaway test. Which is why I do that testing on a desktop box instead of a production server.:) ) So there's no advantage of one over the other in that regard.

Temp tables can also generate statistics, and can be indexed. Again, that can be an important advantage.

Temp tables, on the other hand, aren't recursive. They also require more code to set up, and can easily result in procs that have to recompile every time they are run. (In either the case of mixing DDL and DML, or the case of running a cursor on a temp table, the proc will have to recompile every time it's called. This results in a compile lock for every run, which can result in serious delays for user queries.) CTEs have the advantage there.

It's all a matter of knowing which tool to use for which job.

Use a temp table if more than one query in a proc/script will reference the data. (A table variable works the same way, but table variable vs temp table is a whole discussion all by itself.) Use a CTE if it will be recursive, or only referenced once. Use a CTE to populate a temp table if you need both recursion and multiple references. And so on.

For me a CTE is a lot easier to follow. And the CTE is reusable within the same statement, where a derived table is not. (I love CTE btw) I also have seen indexes being used in a CTE when they were not used in a derived table (but that could have been a side effect of rewriting the query better I suppose?).

Changing the simple example given to use a CTE shows an identical execution plan for me.

;with Data as ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id)select top 1 dept_id, nfrom Dataorder by n DESC

And one little (overlooked?) thing about a multiple CTE, if a resultset is not referenced it appears to not be ran (according to the execution plan). Could I get some feedback on this?

Example, with this I only see TableB being shown in the execution plan :

Based on what I've coded, I would strongly urge the move to CTE instead of derived tables. :)

Tao Klerks (3/18/2008)I see the value of CTEs, but I'm not sure I agree with the example... wouldn't it be better written (more efficient?) using a single aggregate query and a TOP clause?

SELECT TOP 1 dept_id, nFROM ( SELECT dept_id, count(*) AS n FROM employee GROUP BY dept_id ) AS aORDER BY n DESC

I guess the question for me is: while CTEs provide a nice syntax for repeated and recursive expressions, is the SQL optimizer actually able to use them as efficiently as a statement designed to avoid repetitive expressions in the first place?

I tried this on a very simple dataset (comparing all three statements), and found that the total cost of the queries, in all three cases, was the same. The query plans were slightly more complicated for the original statement and CTE statement, and slightly simpler for the statement above (single Sort/TopN Sort instead of Sort -> Filter -> Top) - unfortunately I don't have a large dataset to test on!

My instinct would be: Try to rewrite your statement to use joins and derived tables rather than subqueries, before moving to CTEs as a way of simplifying your subqueries.

Excellent article!Will put to use immediately!BTW, in the sentence "I have found them particularly handy when cleaning up demoralized tables. ",did you mean "I have found them particularly handy when cleaning up denormalized tables."?