You write that recursive operations are row-based in Oracle and set-based in SQL Server.

Which approach is better in your opinion?

First, a quick reminder for those who have no idea what's this all about.

Recursive queries are useful to build the hierarchy trees, traverse graphs, generate arbitrary rowsets etc. Usually this involves joining a set with itself an arbitrary number of times.

A recursive query is usually defined by the anchor part (the initial rowset, or the base of recursion) and the recursive part (some operation over the previous resultset).

This is same as with the plain recursion: take some initial data as a parameter, modify it in some way and then pass the result of the modification as a new parameter to the same function.

Since SQL operates with sets, the input and the output if this function should also be the sets. And here is the main differences between the approaches used by Oracle on one side, and SQL Server, PostgreSQL and some more systems on the other side.

A set_operation can be any set operation: a JOIN, a UNION, an EXCEPT, whatever. It may involve any number of other tables, functions, in other words everything — as long as it's result (also a set, of course) has the same number and datatypes of columns as the anchor part does.

and, unlike the CONNECT BY, it is returned at once. The recursive part is only applied when the previous resultset is known, and it's applied to the resultset as a whole:

ID

Parent

1

NULL

2

NULL

Recursive part: a JOIN with (1, 2)

, which yields all records of the original table (though not in the tree order)

CTE recursion takes a set on input and returns a set on output. It is set-based by nature.

Conclusions

From this, two importation conclusions can be drawn.

CONNECT BY approach splits the sets returned by the recursion and applies the recursion to each individual record. This is less powerful, since a record is just a special case of a set, but the results naturally come out in the recursion order.

CTE approach allows the recursion to process the set returned on the previous step as a whole. This is more powerful, since a set is a general case of a record, but it takes some extra effort to sort the resultset in the tree order had the need arose.

Everything that can be done using a CONNECT BY approach can be done with a recursive CTE as well, but not vice versa. That's why it's safe to say that recursive CTE approach is more powerful.

However, what's good in theory, is not always as good in implementation.

SQL Server's implementation

SQL Server uses recursive CTE approach which is set based and more powerful. But SQL Server places some limitations on what you can use in the recursive part of the CTE. And severe limitations they are. You cannot use aggregates, cannot use OUTER JOINs, cannot use TOP. Here's the complete list.

In the CROSS APPLY, each row of the previous is returned, then the scalar condition which returns true or false is checked for each row of mytable and current row of the previous_result. This can be rewritten as a JOIN.

The reverse is not always true.

Under which conditions it is possible for a recursive CTE query to be rewritten as a CROSS APPLY query?

For this to be possible, the set_operation used in the second query should distribute over the UNION ALL operation.

What does it mean?

The distributivity is a well-known concept in algebra.

For instance, the multiplication is distributed over addition, this means that this condition:

x × (y + z) = (x × y) + (x × z)

is always true.

But further generalization, power, does not distribute over multiplication. The following condition:

x ^ (y × z) = (x ^ y) × (x ^ z)

does not always hold.

In other words, distributivity means that the order of operations is not important. You can add the results of multiplication just as well as multiply the results of addition, the final result will always be the same.

This was arithmetics, but concept of distributivity can be applied to relational algebra just as well. A set operation distributes over UNION ALL if the following condition always holds:

It's easy to see that if the condition above is true, the two approaches to the recursion become the same.

Indeed, the operation over the UNION ALL is the same as the UNION ALL over the operations, which is the same as the operation applied to each record of the previous_result. And the latter is exactly what CROSS APPLY does: applies an operation to each record of the set and returns the resultsing sets combined together (as if with UNION ALL).

A JOIN distributes over UNION ALL. However, DISTINCT, aggregates, TOP and some other operations do not.

Maximum over a UNION ALL is not the same as the UNION ALL of the maximums. TOP of a UNION ALL is not the same as the UNION ALL of TOPs, etc.

If we look closer into the restrictions put on the recursive operations by SQL Server we will see that placing these restrictions have one ultimate goal: to forbid all operations that do not distribute over UNION ALL.

Aggregates are forbidden, TOP is forbidden, DISTINCT is forbidden. This is a conspiracy.

However, they cannot lie to the world all the time. They cannot conceal every fact. There are two operations they failed to forbid. And now it's a moment to reveal the truth.

The setname above is for the sake of clarity only, to distinguish between the sets combined with UNION ALL.

Here's what this query returns:

id

setname

1

anchor

2

anchor

3

anchor

1

recursive2

2

recursive2

3

recursive2

1

recursive4

2

recursive4

3

recursive4

9 rows fetched in 0.0003s (0.0006s)

We see that the query returns the whole anchor part which is just a SELECT * FROM mytable:

id

setname

1

anchor

2

anchor

3

anchor

The next part, recursive1, applies the EXCEPT operator to the mytable and anchor:

SELECT *
FROM mytable
EXCEPT
SELECT *
FROM anchor

But since mytable and anchor actually match, the result of this operation is an empty rowset. And that's why we don't see recursive1 in the resultset above.

The next part, recursive2, applies EXCEPT again:

SELECT *
FROM mytable
EXCEPT
SELECT *
FROM recursive1

This time, the second argument to EXCEPT is empty, and each record from mytable is returned:

id

setname

1

recursive2

2

recursive2

3

recursive2

, etc.

EXCEPT applied to mytable and an empty set yields a copy of mytable. EXCEPT applied to the mytable and a copy of mytable yields an empty set. Therefore, the final resultset will be endless, consisting of alternating empty sets and copies of mytable.

The original recursive query, in theory, should return the same rowset. Let's check if it's really so:

The next set (recursive1) should have been empty. But it has 2 rows instead:

id

1

2

This could only happen it the EXCEPT clause was applied to the last record of the anchor set rather than to the whole anchor:

SELECT *
FROM mytable
EXCEPT
SELECT TOP 1 *
FROM anchor
ORDER BY
id DESC

The query above would yield the same results.

In recursive2, the EXCEPT operator, had it been applied to the whole resultset returned by the previous query, would yield a single row 3. However, it returns 1 and 3, which means that EXCEPT clause was again applied only to the last row.

This is not how a properly designed set-based recursive CTE should behave. But this is exactly how CONNECT BY query behaves. It splits the recordset into separate records and applies the recursive part to each record individually.

ROW_NUMBER

Analytical functions without PARTITION BY clause do not distribute over UNION ALL as well.

A UNION ALL of two queries containing the ROW_NUMBER's is not the same as a ROW_NUMBER applied to the results of a single UNION ALL query. The latter would return a single consecutive counter, while the former would count the records in each query separately.

The ROW_NUMBER's are not consecutive. Instead of a single sequence from 1 to 4, there are two sequences from 1 to 2.

Again, that's exactly what a CONNECT BY query (or a CROSS APPLY) would return.

Summary

There are two approaches to implementing a recursive query:

One approach is record-based. A recursive part takes a record as a parameter and returns a set. It is then separately applied to each record of the set returned . This approach is implemented by Oracle in CONNECT BY queries.

Another approach is set-based. A recursive part takes a set as a parameter, processes it as a whole and returns a new set. It is then passed as a parameter to the next iteration. This approach can be implemented by recursive CTE's

The record-based approach is better for building hierarchies because it naturally returns the records in the tree order. However, it's less powerful in theory.

The set-based approach is more powerful in theory, since it can process a set as a whole and more complex conditions can be applied. Extra effort is required to sort the resulting recorset in the tree order but it is still possible though less efficient.

In theory, recursive CTE's are more powerful and therefore should be preferred.

SQL Server nominally impelements recursive CTE's, but in fact they are no more than CONNECT BY queries in disguise. No real set-based operations can be performed over these CTE's, only the record-based ones.

SQL Server efficiently combines limitations of the record-based approach used by CONNECT BY with the absence of the natural tree ordering implied by the set-based approach used by recursive CTE's.

That's why I'd say that Oracle as of 10g implements recursive queries better than SQL Server as of 2005.