Inline Scalar Functions

User defined functions give you great benefits in terms of encapsulation and code reusability. Unfortunately though, when you invoke a scalar UDF as part of a query, and pass the function a column from the table as input, the function is invoked separately for each row. This is true even when all the function has is a RETURN clause with a single expression that theoretically could have been inlined in the query. This is how SQL Server always handled scalar UDFs from the moment those were introduced in the product (version 2000) and still does today (version 2008). The result is that a query that uses such a function would typically run significantly slower than a query that embeds the original expression inline instead of invoking the function.

I’ll first demonstrate the problem and then provide an alternative that would allow you to use functions without sacrificing the performance of your queries.

To demonstrate the problem first create the table T1 and populate it with 1,000,000 rows by running the following code:

-- Create and populate T1 with 1,000,000 rows

SETNOCOUNT ON;

IFOBJECT_ID('dbo.T1') IS NOT NULL DROP TABLE T1;

GO

WITH

L0 AS (SELECT 0 AS c UNION ALL SELECT 0),

L1 AS (SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B),

L2 AS (SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B),

L3 AS (SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B),

L4 AS (SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B),

L5 AS (SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B),

Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)

SELECT n INTO dbo.T1 FROM Nums WHERE n <= 1000000;

GO

Next, create the scalar UDF AddOne by running the following code:

-- Create scalar function AddOne

IFOBJECT_ID('dbo.AddOne') IS NOT NULL DROP FUNCTION dbo.AddOne;

GO

CREATEFUNCTION dbo.AddOne(@n AS BIGINT) RETURNS BIGINT

AS

BEGIN

RETURN @n + 1;

END

GO

This particular function simply adds 1 to the input value, but the idea of course is to use an example for a calculation that can be expressed as a single expression. Just the same the function could have been one that calculates the number of working days between two dates, the last month day corresponding to the input value, or any other single-expression calculation.

Now run the following query which invokes the function:

-- Query 1

SELECTTOP (1) n, dbo.AddOne(n) AS r

FROM dbo.T1

ORDERBY r;

The purpose of the TOP (1) and the ORDER BY here is just to prevent the query from returning all 1,000,000 rows in the output. It doesn’t really add much to the cost of the query. It is treated pretty much similar to a MAX aggregate since the query is asking for one top row. Observe the execution plan of the query:

Notice in the Compute Scalar operator that the expression in the function wasn’t inlined in the query, rather the function was in fact invoked separately for each row. Here are the measures I got from STATISTICS TIME for this query:

SQL Server Execution Times:

CPU time = 9266 ms, elapsed time = 14206 ms.

It took 14 seconds for the query to complete on my system, and it consumed quite significant CPU time! Compare this to embedding the original expression inline in the query:

And as you can see, the run time dropped to under a second. Does this mean that you are doomed to having to choose between the benefits of UDFs and performance, and not being able to enjoy both? Not necessarily…

SQL Server does support inline table-valued UDFs. Those are truly inlined in the outer query much like any other table expression (derived table, CTE, view) is. That is, when querying such a UDF, SQL Server internally expands the UDF’s definition, and rearranges the query such that it directly accesses the underlying objects. With this in mind, consider the following alternative…

Instead of creating a scalar UDF with a RETURN clause that looks like this:

RETURN <expression>

Create an inline table-valued UDF with a RETURN clause that looks like this:

RETURN SELECT <expression> AS val

And then in the query, instead of the expression:

dbo.MyFunction(col1) AS result

Use:

(SELECT val FROM dbo.MyFunction(col1)) AS result

For example, to implement an alternative to our AddOne function, create the following AddOneInline function:

Discuss this Article 6

"Hi Brian,
The function is invoked separately for each row of the table you are querying. As for telling how many times the function was invoked, this can be achieved with a simple trace. Capture the SP:Completed (or SP:Starting) event. To limit the size of the trace, you might want to test the query against a small number of rows—for example, with the filter n <= 10. Following is a screen capture of Profiler showing the events I got when I tracing the query that invokes the UDF: http://www.windowsitpro.com/Common/adforceimages/Inline Scalar Functions comment fig1.jpg.
Cheers,
Itzik"
Posted by Megan Keller
Associate Editor, SQL Server Magazine
mkeller@sqlmag.com

I had always thought subselects were a thing to avoid - who knew that they could actually improve things if constructed properly! This brought my query from a duration of 493,000,000 ms to 308. Thank you thank you thank you! I am floored by this.

I want to understand something better. The article says: "...Unfortunately though, when you invoke a scalar UDF as part of a query, and pass the function a column from the table as input, the function is invoked separately for each row. "
My question is for each row of what? Of the table you're querying? Of the table that the UDF queries? Of the resultset? And how can I see exactly how many times the function was actually called during the query processing?
Thanks,
Brian

From the Blogs

Don’t let bad data sneak up on you when and where you least expect it. Ferret out bad data with Melissa Data’s newest Profiling Component for SSIS. Learn how to take control of your data using knowledge-base-driven metadata. The truth shall set you free!...More

Now that we’ve outlined the process to let servers in a SQL Server AlwaysOn Availability Group "talk to each other" by means of setting up linked servers, it’s possible to set up some additional or improved checks on Availability Group Health....More

In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks....More