My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.

I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table. Let's see if I can explain that .... A derived table is "self-contained", in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced). For example, consider:

That is not legal because A.Val is out of scope within the derived table; this is because the derived table is evaluated independently of the other tables in the SELECT. To limit the rows in table B so that B.Val = A.Val, we must do that outside of the derived table via a join or in the criteria:

(Of course, the above is equivalent to doing an INNER JOIN to the derived table, or just joining to the table B.)

Also, keep in mind that the scope-of-derived-tables rule isn't just for CROSS JOINS, it's for all JOINS -- CROSS, INNER, OUTER and even UNION; they all use "self-contained" derived tables.

This is in contrast to a correlated sub-query, where the parent SELECT is in scope for the sub-query; the sub-query is evaluated for each row in the query, so the other tables and columns in the SELECT are all available:

select A.*, (select B.X from B where B.Val=A.Val) as Xfrom A

(Note: I am ignoring for now the fact that returning multiple rows in a sub-query will return an error.)

This is an easy way to think of the difference between CROSS JOIN and CROSS APPLY. CROSS JOIN, as we saw, joins to a derived table; however, CROSS APPLY, despite looking like a JOIN, actually is applying a correlated sub-query. This imposes both the advantages of a correlated sub-query but also the performance implications.

So, we can simply rewrite our first example using CROSS APPLY like this:

is not legal; once again, A.Val is not in scope for the user-defined function. The best we can do before SQL 2005 was to use a correlated sub-query:

select A.*, (select X from dbo.UDF(A.Val)) Xfrom A

However, that is not logically equivalent; the UDF cannot return more than 1 row or it will result in an error, and wouldn't make logical sense anyway if it did.

Starting with SQL 2005, we can now use CROSS APPLY and it will work fine:

select A.*, b.Xfrom Across apply dbo.UDF(A.Val) b

So, that is one way to think of the difference between a JOIN and an APPLY; a JOIN combines two separate result sets, but APPLY is more of a loop that evaluates one result set over and over for each row in another. This means that, in general, APPLY will be less efficient than a JOIN, just as, in general, correlated sub-queries are less efficient than derived tables. (The optimizer, however, is generally quite good at optimized correlated sub-queries when possible.)

So, why use CROSS APPLY instead of a correlated sub-query? What's the advantage? Actually, quite a lot -- it is much more powerful!

CROSS APPLY can return multiple rows

Unlike correlate sub-queries, CROSS APPLY works with multiple rows. This allows us to do things like "joining" a table to a function that parses a CSV column in that table into multiple rows:

select A.ID, b.Valfrom Across apply dbo.ParseCSV(A.CSV) b

When the ParseCSV() function returns multiple rows, it simply acts as if we have joined Table A to the function's return table, duplicating the rows in Table A for each row in the joined table. This is not possible with a correlated sub-query, and will result in an error. This is a quick and easy way to parse a table of data into multiple rows in an efficient "set-based" manner when the algorithm requires a complex User-Defined Function. (The CSV is not a great example, but other parsing routines are not as easily accomplished via a JOIN to a numbers table.)

CROSS APPLY can return multiple columns

Again, in a correlated sub-query, we can only return a single value. If we write a SQL statement that returns a running sum, we can use a correlated sub-query like this:

select o.*,
(select sum(Amount) from Order o
where p.OrderDate <= o.OrderDate) as RunningSumfrom Order o

However, what if we'd like to return an additional running sum of Orders based on some other criteria (e.g., for orders with the same "OrderCode")? We'd need another correlated sub-query, greatly reducing the efficiency of our SELECT:

select o.*,
(select sum(Amount) from Order o
where p.OrderDate <= o.OrderDate) as RunningSum, (select sum(Amount) from Order o where p.OrderCode = o.OrderCode and p.OrderDate <= o.OrderDate) as SameCodefrom Order o

We can also CROSS APPLY to a Table-Valued User Defined Function that returns exactly one row, but with multiple columns, to return separate pieces of data from a single function call. For example, we can parse an email address into separate username and domain columns. A few months back I wrote a SQL Team article that discusses that concept.

Summary

CROSS and OUTER APPLY are very powerful and can be very useful, but we must be careful to use them only when necessary; I am still testing the possibilities, but in general an APPLY will tend to be not as efficient as a JOIN. In addition, APPLY is mostly demonstrated by applying table-valued user-defined functions, but it can be used with in-line SELECT statements as well.

I happened to use a CROSS APPLY for the first time a couple of days ago. I wanted to produce a table of messages where the message may be sent to one or more recipients. Using the CROSS APPLY I was able to generate a delimited list of recipients in the subquery and join it to the main table.

Bye bye cursor.

If you are interested I adapted the final query in this example: http://zulfiqar.typepad.com/zulfiqars_web/2005/04/tsql_concatenat.html

"This is an easy way to think of the difference between CROSS JOIN and CROSS APPLY. CROSS JOIN, as we saw, joins to a derived table; however, CROSS APPLY, despite looking like a JOIN, actually is applying a correlated sub-query. This imposes both the advantages of a correlated sub-query but also the performance implications."

I tried to use the examples and got in trouble because “cross apply” only works on SQL 90 compatibility mode – not in SQL 2005 with mode 80 enabled (my case). Since I can’t change the database, I create the following work around:-- CodeUSE tempdb – tempdb is a native SQL 2005 DBSELECT tb.field1, tb.field2, fc.* FROM #tempTB as tbCROSS APPLY OficialDB_Compatibility80.dbo.myUDF (tb.field1, tb.field2, tb.field3, @dinamicVAR) AS fcUSE OficialDB_Compatibility80-- Code