Yes, I was indeed talking about two different things. One is the CLR-in-SQL function, and the other is a pure C# application.

I think the C# application was fairly well protected against being optimized out. I didn't realize I never attached it; I can see if I have it around. I inspected the generated IL and the runtime execution traces. I also observed that it scaled linearly with the number of iterations. Just to be certain, I also accumulate a value in the loop body, and print out that accumulated value at the end.

In my experience, the CLR JIT optimizer rarely optimizes out entire loop bodies. It just does not go through that work to do the analysis, likely under the assumption that it has little benefit for real world (non-benchmark) cases. (The Oracle JVM actually does do such eliminations fairly often. In fact, I find the Oracle JVM to perform stronger optimizations across the board. However, with C# and the CLR, you have pointers and non-heap types and similar things, and thus have more ability to optimize what you find necessary. Basically, as a rule of thumb, performance almost always follows: simple C# < java < optimized C#.)

It seems like there are further limits on really making the SQL CLR functionality match the performance of pure CLR code for numerical computation. For example, when passing over a large set of primitive values, it is not fast to make any call at all for each and every entry. It would commonly be faster to first shuttle the values into a primitive array and then execute on that, for reasons of memory locality and call overhead. Further, SQL Server could in theory persist such copies at will, as long as it treated them as a cache and invalidated them. For data warehousing scenarios this would work well, since updates are rare, and that is where high-performance numerical computation is common. Unfortunately, this kind of usage has not been a priority. I suppose there just isn't much customer demand for a unified system... As it is, I would be cautious about overdoing any complex computations that are not core SQL Server scenarios. For many such cases of numerical computation, the optimal design (often by orders of magnitude) will be: dump the data from SQL Server, compute, spew data into SQL Server.

Now, optimizing large-scale extraction and import is a different topic, and is also far from ideal performance with standard mechanisms...

I wish we didn't have to consider all of this complexity. There is no reason why we should have to worry about making CLR versions of TSQL functions that could be created by a template transform, to get high performance. We also shouldn't have to worry that the best we can do for some large-scale computation on a data warehouse might be 1000x slower than what we could do if we dump the data, even when highly-optimized. This really reduces the flexibility and simplicity of systems built around SQL Server for a wide variety of cases. Of course, again, these are not the core customer cases, so...

I assume the standalone C# code I was measuring was something like (although more JIT warming, etc. would be desirable):

Unlike the C# code, this always takes 0 ms to execute. It appears to simply analyze the code and conclude that only the last iteration of the loop will matter. If you change the Mult function to accumulate instead, it no longer can do that. Still, it then takes only 6 ms to execute, which is much faster than the C# version.)

rragno (9/23/2012)Unlike the C# code, this always takes 0 ms to execute. It appears to simply analyze the code and conclude that only the last iteration of the loop will matter. If you change the Mult function to accumulate instead, it no longer can do that. Still, it then takes only 6 ms to execute, which is much faster than the C# version.)

So the Java optimiser is somewhat better than the C# optimiser. Of course I would have expected pre-compiled code in either of these languages to run in 0 ms for the version with the accumulation as well, so these numbers suggest that optimisation techniques (for procedural languages like these) have deteriorated badly in the last four decades, or everything is doing recompilation all the time, or people have decide that some of the possible optimisations are rather pointless. And it seems pretty clear that the C# optimisation is really bad.

mmilodragovich (8/1/2012)Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.

You're welcome and thanks for the feedback. I've had similar not-easy-to-explain things happen in this wonderful world of T-SQL.

Actually, it's not so hard to explain.We all forget time to time that SQL is an interpreter, not a compiler.Therefore any piece if code is at least re-evaluated (worst case scenario - recompiled) every time it's invoked.

Scalar function is applied in the final recordset. One time for an each entry a returned row.

If you try to trace a query with a scalar functuion in it you'll see this:

At the same time, table function (parametrised view) in JOIN statement is invoked ONCE - so reavaluated (recompiled) once.

That's where the performance difference comes from.SQL Server is just being busy creatind-recreating the scalar function in memory, that's all.

In real life it should not make much difference, unless your system (querying) design is really stupid.There is no way a user wants to see more than a hundred rows on UI. Therefore you should never have million rows scenario in real-life application.And if you apply a WHERE clause to all those queries leaving under 1000 rows in the final recordsets you'll see that that difference is really minor, less than 5%.Not worth any attention, really.Same case as explicitly dropping #tables in a SP code - if you do a test run on only create/drop operations in a SP and execute it 10000 than overheade be really noticable. But in a real life procedures the impact will be totally unnoticable. Well, there will a negative impact on performance when you explicitely drop a # table at the end of a SP, but it's totally forgivable for sake of one's style (not my style! )

In real life you should not have millin-row SELECT statements, so overhead of a scalar function can be ignored, if it's more comfortable to code.Unless you're doing some kind of Data Transformation task, but even then - it's one-off, and even a big overhead can be tolarated.

Views used for reporting - yes, that's where you should be concerned.And not using UDF's in SELECT part of the queries must be a mandatory rule.

Ah, Sergiy... my ol' friend. Long time no see. Thank you for stopping by and thanks for the great explanations.

Yeah... I agree... most folks using front end code aren't going to notice any difference between using a Scalar UDF or an iTVF. Chances are, many won't even be able to measure the differences because of the extremely low rowcounts especially since I showed them why not to use SET STATISTICS to measure Scalar UDF performance.

I do try to avoid Scalar UDFs, though, because most of the stuff I do is on staging tables where the cumulative times of multiple functions across many columns does start to make a difference especially when loading hundreds of large files in such a manner. 5% performance differences can really start to stack up in batch code.

For GUI code, I write it as if it were large volume batch code because so many people will "leverage" any code they can find in a schedule pinch for their own batch code. If the iTVFs turn out to be a bit more complicated than some will understand, a well placed comment here and there in the code takes care of the problem. If not, then they probably shouldn't be trying to modify the code to begin with.

Of course, it's still possible to write crap code in an iTVF. iTVFs aren't a panancea.

--Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

I know this is an old article reprinted and you might not see this, but I thought I'd ask anyway.

If you're using a Scalar UDF to define a Computed Column, can you still use a iTVF?

--------------------------------------When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.--------------------------------------It’s unpleasantly like being drunk.What’s so unpleasant about being drunk?You ask a glass of water. -- Douglas Adams