02 February 2015

PRINT Statement "Churn"

So I was sitting in one of those post-mortem type meetings, the sort fraught with tearful recriminations and teeth-gnashing (or at least, perhaps, they ought to be) and I was amused to observe the birth of a silly superstition. It was a mix of technical and nontechnical folk, business analysts, developers, business VPs, etc. We were discussing the performance of a particular SSIS package that runs a stored proc, appears to be complete in terms of data, but then sits and runs on the server for hours afterward, when it should have completed and closed. One developer coined the term "churn", as in, how butter is made, to describe what SQL Server was doing.

It was baffling to see how readily and happily the non-technical folk (and some of the technicals, for shame) accepted this as a good enough answer. Instead of asking, why is this happening and how do we fix it, the execs just came to view "The Churn" as a necessary part of the data processing. "So, how long do we expect The Churn to take with this large a file?" "Can you get me an ETA for when The Churn will be finished?" Soon everybody was getting in on it...even people on the business side could understand The Churn. I mean, what could be more natural than that, you run an SSIS package, it works, and then you have a few hours for the DTEXEC process to sit and "churn"?

This kind of silly superstition, where we take something we don't understand and invent a name for it as a kind of acceptance, hobbles an IT organization, papering over ignorance instead of exposing it and digging deeper. It turns out in this case, as our appdev manager surmised, it was massively looped PRINT statements (completely unnecessary of course) in the stored procedures.

As is, with the PRINT statement commented out, this runs very fast (subsecond), as you would expect. The INSERT just lets you know when the proc is actually done.

So I created a DTSX package that does nothing more than run this process, and from a different server, created a batch file that outputs the time, then runs the package, then outputs the time again. Running the proc as is, well, as I say, there's nothing to it, it's just memory getting moved around. But when I uncomment the PRINT statement, things take a bit of a nosedive.

Comparing the package start time (10:41:33.49) to the timestamp logged in the table (10:41:35.010) we see that less than 2 seconds elapsed in the actual execution of the procedure. However, the package completed at 10:45:23.38. So two seconds to do the work, and almost four minutes for SSIS to handle all those PRINT statements getting pushed back up the chain. Now, imagine that scaled up to millions or billions, and you can understand why we had an apparently complete SSIS package chewing up one of the vCPUs on a box for days. Yes, days. And taking the PRINT statements out fixed it.

Tactical moral of the story: be very sparing and cautious when sticking PRINT statements into heavily looped sections of your code, unless absolutely necessary. Strategic moral of the story: stop accepting surface level explanations and lazy superstitions...dig in and figure it out.