@Karla if you're just selecting a count(*) from the overall select, the whole calcNumber thing is unnecessary, so it's probably optimized away. In the other CTE you're filtering based on it, so it does have to be calculated.

What are the query plans? I wonder if SQL Server is smart enough to first filter based on the equality comparisons, and only then apply the calculation.

@Karla if you're just selecting a count(*) from the overall select, the whole calcNumber thing is unnecessary, so it's probably optimized away. In the other CTE you're filtering based on it, so it does have to be calculated.

What are the query plans? I wonder if SQL Server is smart enough to first filter based on the equality comparisons, and only then apply the calculation.

I'm doing way more than count. That was just to isolate the problem.

I'll see if I can spot the difference in the query plans. The problem is the actual queries are way more complicated than this.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Can you give me an example of how to do that?

Would be happy to when I get home. Not doing this on mobile.

Also, the function returns a decimal if that has any bearing.

Shouldn't, but it gives me some idea that there are some calculations going on in the background, which may or may not be expensive. But I've found many a bottleneck that was caused by a scalar function against a large result set. It's also possible that its evaluation is getting delayed to the second CTE, which is would explain why the first CTE is fine by itself.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Can you give me an example of how to do that?

Would be happy to when I get home. Not doing this on mobile.

Understood.

Also, the function returns a decimal if that has any bearing.

Shouldn't, but it gives me some idea that there are some calculations going on in the background, which may or may not be expensive. But I've found many a bottleneck that was caused by a scalar function against a large result set. It's also possible that its evaluation is getting delayed to the second CTE, which is would explain why the first CTE is fine by itself.

OK that at least makes sense then.

In the past I've had success in moving scalar functions out of the where clause into the select which is why this one seems so weird.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

@Karla
The biggest reason that scalar functions are often a boogyman is that the optimizer can't mash them down into a set based operation, so their cost goes up linearly (or worse, if you wind up memory constrained and have to spill to TempDB) as you have more rows being effected. And, to compound the problem, the optimizer can't estimate effectively for scalar functions either, so it's very likely that you'll wind up memory constrained in large operations, because the optimizer's guess will be too low. (And in small operations, you'll get too much memory and potentially impact other operations on the server that have to wait for memory or work with less).

How grungy is the calculation? Can it just be inlined within the SELECT statement directly?

@Karla
The biggest reason that scalar functions are often a boogyman is that the optimizer can't mash them down into a set based operation, so their cost goes up linearly (or worse, if you wind up memory constrained and have to spill to TempDB) as you have more rows being effected. And, to compound the problem, the optimizer can't estimate effectively for scalar functions either, so it's very likely that you'll wind up memory constrained in large operations, because the optimizer's guess will be too low. (And in small operations, you'll get too much memory and potentially impact other operations on the server that have to wait for memory or work with less).

How grungy is the calculation? Can it just be inlined within the SELECT statement directly?

It is grungy...it is calculating a percentage of attendance based on several different dates. I think I am getting to the point I have to let it go for tonight. Right now, I know my logic is wrong in calculating it as a table value function. It is much easier to do it per record.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

I just looked that up and based on the complicated calcs, I am not sure inline is possible.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

I just looked that up and based on the complicated calcs, I am not sure inline is possible.

Everything is possible when you let the mystic power of the Dragonflame burn in your heart.

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

I just looked that up and based on the complicated calcs, I am not sure inline is possible.

Everything is possible when you let the mystic power of the Dragonflame burn in your heart.

The result may be a little ugly, though.

OK, I see that I can use CTEs inside an inline table variable function.

I find it odd that the function calls don't show any table hits, but as you can see, the time results are dramatically different. I also find the "physical reads 0" to be interesting, but I attribute that to this rig having 32GB of RAM.

And yes, I ran these multiple times to rule out caching.

Edit: That database took up a whole 900MB (on a drive that has 2.41 TB free). Good thing it can be expressed more compactly, like in script above.

-- Only allow a single discount for now. This makes the example simpler, but if you really wanted to have cascading discounts, you'd need a recursive CTE to accumulate them, and I don't feel like writing one of those right now given that I have to pack for a trip tonight.
select top 1 @price = (@price * (1.0 - percent_discount)) - straight_discount
from order_discounts
where order_id = @order_id
order by order_discount_id -- Added for determinism and to ward off undefined behavior

For the most part, there isn't. In the query plan, it will boil down to a join. However, APPLY lets you do a few things you can't do with a join to a derived table:

select f.*, baz.quux
from foo f
outer apply
(
-- Notice how we're referencing outside columns as if we were in a subquery. Can't do this with a derived table.
select sum(b.price) * f.multiplier as quux
from bar b
where b.foo_id = f.foo_id
) baz

Of course, the intended purpose of APPLY is supplying a column value to a table function:

For the most part, there isn't. In the query plan, it will boil down to a join. However, APPLY lets you do a few things you can't do with a join to a derived table:

select f.*, baz.quux
from foo f
outer apply
(
-- Notice how we're referencing outside columns as if we were in a subquery. Can't do this with a derived table.
select sum(b.price) * f.multiplier as quux
from bar b
where b.foo_id = f.foo_id
) baz

This is just to ensure that the column names will be included in the output, because the original author apparently didn't know about the "Copy with headers" option in SSMS or its keyboard shortcut Ctrl+Shift+C.

Now, how they could guarantee that this row would always be at the top......
Manual editing of the output?

Wisdom is the principal thing; therefore get wisdom: and with all thy getting get understanding.

This is just to ensure that the column names will be included in the output, because the original author apparently didn't know about the "Copy with headers" option in SSMS or its keyboard shortcut Ctrl+Shift+C.

Now, how they could guarantee that this row would always be at the top......
Manual editing of the output?