Another reason why scalar functions in computed columns is a bad idea

As if there weren’t enough reasons

In my last blog post I talked about different things that cause plans, or zones of plans, to execute serially. One of the items I covered was computed columns that reference scalar functions. We know that they’ll make queries go serial, but what about other SQL stuff?

Oh no my index is fragmented

If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.

That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.

Transact-SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CREATEFUNCTIONdbo.PIDMultiplier (@pidint)

RETURNSINT

WITHRETURNSNULLONNULLINPUT,

SCHEMABINDING

AS

BEGIN

DECLARE@OutBIGINT;

SELECT@Out=@pid*2

RETURN@Out;

END;

GO

ALTERTABLEdbo.PostLinks

ADDMultipliedASdbo.PIDMultiplier(PostId)PERSISTED;

For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.

Transact-SQL

1

2

3

ALTERTABLEdbo.PostLinksREBUILDWITH(ONLINE=ON);

ALTERTABLEdbo.PostLinksDROPCOLUMNMultiplied;

Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.

Hi, I’m garbage.

Parallel, sans computed column:

Dude, you’re getting a parallel.

But there’s a bigger fish in the pond

Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.

Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.

The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.

With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.

With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.

Transact-SQL

1

2

3

4

DBCCCHECKTABLE('dbo.PostLinks')WITHNO_INFOMSGS,ALL_ERRORMSGS

ALTERTABLEdbo.PostLinks

ADDMultipliedASdbo.PIDMultiplier(PostId)PERSISTED;

Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.

Hell Yeah.

Hell No.

So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.

I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.

Scalar Functions ALWAYS cause any queries using them to run single-threaded. It’s horrific in computed columns because the function basically causes the query to run RBAR. Why would anyone think this is a smart idea?

Agreed but it’s either going to happen from the table or it’s going to happen in code. iSFs will certainly make such a thing faster but if the column is supposed to change when someone updates other columns, it’s one way to ensure that happens.

I DO agree that computed columns can be a royal PITA when it comes to table changes especially if you’ve made the mistake of functions that call functions but sometimes the promise of future pain is worth simplicity in the present. 😉 It’s a whole lot better than some nub that want’s to build the formula into a regular view and then query on it. 😉

Understood. I was replying to Haris. Computed columns have served me well when I needed them to. It’s like voting for POTUS… sometimes (all the time with POTUS), you have to choose between the lesser of two evils. 😉

As a bit of a sidebar and except when you make the mistake of shrinking a database, fragmentation doesn’t matter. I’ve not rebuilt my indexes since 17 Jan 2016 and performance actually got better over the first 3 months of that.

I generally agree that rebuilds shouldn’t be run as regular maintenance for performance, but I would say that for good reasons to rebuild (like changing a setting, definition, or fill factor of an index), this will be a problem.

Heh… “Fill Factor”. Prior to 2016, I would have agreed. Now, not so much. I’ve found that indexes will create an effective “natural fill factor” due to page splits if you just leave them alone. The extra space produced by the page splits is normally used quite nicely when other data is added to the table because NCIs are normally not temporal or otherwise ever increasing in nature. Although I have a personal hatred for GUIDs (actually a misnomer since the advent of Type 4 GUIDs), even those benefit from just leaving the indexes alone to do what they were designed to do.

Again, the exception to that rule is the horrific method they used to shrink a database. I don’t know what else they did there to screw things to the floor so badly but that’s the only time that I’ll rebuild or reorganize indexes anymore and seems to work very nicely whether you have a heavy OLTP, batch, or combination of predominate queries. Brent was definitely sucking the right kind of bong water when he first made that suggestion for SAN based systems. 😉

Is there a way to create a computed column that queries another table without using a UDF? I realize it would not be deterministic and could not be persisted. I’m on SQL Server 2016 if it makes any difference.