Tables on the Fly

You might have heard of Common Table Expressions (CTEs), T-SQL’s tool for building tables on the fly in the single statement. They’re a lot like Derived Tables, in that they’re SELECT statements which can be used by other portions of an encapsulating SELECT statement. But unlike Derived Tables, they can be used multiple times (in the same statement). It’s not accurate to think of them as statement-level temporary tables, because they aren’t filled up with data as a discrete step. Except for their recursive call feature, they are tantamount to repeating the contained SELECT statement multiple times in the main SELECT statement.

For example, this SELECT statement repeats the same Derived Table multiple times.

They have the same query plan. And there is no separate step in the query plan for the “population” of the supposed temporary table.

So if you’re going to repeatedly use the same *data* multiple times in a query, the Common Table Expression is your tool of choice. But what if you want to use the same logic?

Pump it up.

Unfortunately, there’s no similar syntactical sugar to create a statement-level user-defined function. There’s no direct way to have the same calculation applied to different columns of data without creating an actual user-defined function or simply repeating the logic. But there is a neat trick which in the right circumstances can accomplish the same goal without costing you any performance.

The goal, as with any function, is not repeating yourself. It’s not just a bunch of extra characters in your code. It’s an accident waiting to happen– two representations of the same logic which are just bound to diverge due to our cursed human imperfections.

But how can you run different columns of data through the same calculation in a query? Turn them into rows!

Calculations are already performed repeatedly for every row in a query. So all you have to do is take your columns and explode them up into multiple rows. In order to apply a calculation to Column1 and Column2, first change each row of the data into two more rows with a new, computed Column3 which has the original row’s data for Column1 first and the data for Column2 second.

In other words, you need to convert this:

Column1

Column2

ABC

DEF

GHI

JKL

Into this:

Column1

Column2

Counter

Column3

ABC

DEF

1

ABC

ABC

DEF

2

DEF

GHI

JKL

1

GHI

GHI

JKL

2

JKL

Once you have that vertical list of the columns’ values, you can APPLY your calculation to it, once, and it will operate on all of the values.

Column1

Column2

Counter

Column3 – Altered

ABC

DEF

1

CBA

ABC

DEF

2

FED

GHI

JKL

1

IHG

GHI

JKL

2

LKJ

All that’s left is to smoosh those vertically-represented values back into a single horizontal row. You could use PIVOT, but that syntax is a little tricky. Instead, first duplicate the calculated column and then selectively suppress the values you don’t want (by using a CASE expession). You’ll end up with this:

Column1

Column2

Counter

Column1 – Altered

Column 2 – Altered

ABC

DEF

1

CBA

ABC

DEF

2

FED

GHI

JKL

1

IHG

GHI

JKL

2

LKJ

If you take the MAX of each of those calculated columns, you end up with a single row for each one, with your final results:

Admittedly, there is more overall code in this simplified example, but imagine that the logic was more complex. And also, remember that it’s not about character count, but rather removing duplication because of the potential for different behaviors if the copied code isn’t perfectly maintained.

Unfortunately, unlike with Common Table Expressions, this is not mere syntactic sugar. This is semantically different T-SQL which makes the engine do more work– first, spreading apart the columns into rows before shoving them through your calculation and then squishing them back down again. But in the right circumstances, it might add only trivial overhead, as in this example.

Still, use this technique with care. It’s almost always better to just make actual inline user-defined functions, which often have identical performance with the benefit of code encapsulation, as they do in this example.

Ceci est une pipe

This isn’t the only way to reduce code duplication in your complex T-SQL statements. If you have two calculations which have similar beginnings but differ in the final steps, break them apart into their separate steps so that you can re-use the first steps of the calculation.

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

Department of Redundancy Department

You are going to have business logic in your database, whether you like it or not. Use these and other techniques to make sure it’s only in there once.