SQL - easiest way to keep DRY

There are about 13 stored procedures that have same basic structure, joining 5 of the same tables, etc.

Each does a grouped by count (list of X with count of Y), details listing the Ys. Both the grouped result set and detail result set have unions because the date math varies on what type of X. This means there are minimally 4 queries in each sp.

Each one varies by what conditions it requires, various other joins, and subqueries. Each one sets a date to use in the where clause.

I have to change all of them in the same way (design bug that has only recently been noticed). They are all relatively slow which isn't the issue (though I can't help myself when I see functions in the where clauses).

Here is what I did so far:

Create a view with basic joins in common, added columns to replace the functions in the where clauses and/or simplify the where clauses to be able to follow better

Turn the date calculation into a function so it is just on call

I did this for the first sp. Now I am looking to do the remaining ones...and I just can't copy/pasta > 40 more times. I. Just. Can't.

@Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

I know, I know. But abstractions leak, a fact of life.

Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.

@Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor.

@Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

I know, I know. But abstractions leak, a fact of life.

Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.

Another common query planning antipattern:
If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

Scalar valued functions are basically considered harmful anywhere but the SELECT clause.

Views for common joins. Maybe functions in places, but they tend to do unfortunate things to the query planner.

I understand not making the view do too much..

Another common query planning antipattern:
If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

Yes, this why I felt the need to remove them even though performance isn't the complaint. It bothers me.

@Karla I've heard some DBMSes will ruin your DRY by, for example, being notoriously bad at planning whenever your queries involve views in any form. For many of them, a view might be an optimization fence. You'll end up with a large view being materialized before anything else even kicks in.

I know, I know. But abstractions leak, a fact of life.

Procedural SQL languages are bad at modern coding techniques because of their design. The only way you can work is to make your code editor-friendly and master find-replace. That's usually what I do.

Let me know if you think this is a

Have a master sp that does all of the common stuff. That stores data in a temp table.

Another common query planning antipattern:
If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.
Scalar valued functions are basically considered harmful anywhere but the SELECT clause.

It does depend strongly on the optimizer and the query engine. But some functions can be folded, which can make it tolerable

I use DB2, which has explicit types for TIMESTAMP (date plus time) and DATE (no time component). Suppose column T is TIMESTAMP and variable dtlow and dthigh are pure date. This is horribly inefficient:

SELECT *
FROM MYTAB
WHERE DATE(T) BETWEEN :dtlow AND :dthigh

That's because DATE() must be applied to each row to convert it for the comparisons, which basically means no indexing on T. That has to happen because T is not constant across all rows.

Now consider:

SELECT *
FROM MYTAB
WHERE T BETWEEN TIMESTAMP(:dtlow,'00.00.00') AND TIMESTAMP(:dthigh,'24.00.00')

The variables dtlow and dthigh are constant for the life of the query. Therefore, the TIMESTAMP functions can be pre-folded, which means this is equivalent to T between constant times. DB2 considers that eminently indexable, and efficiency will be high.

Another common query planning antipattern:
If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.
Scalar valued functions are basically considered harmful anywhere but the SELECT clause.

It does depend strongly on the optimizer and the query engine. But some functions can be folded, which can make it tolerable

I use DB2, which has explicit types for TIMESTAMP (date plus time) and DATE (no time component). Suppose column T is TIMESTAMP and variable dtlow and dthigh are pure date. This is horribly inefficient:

SELECT *
FROM MYTAB
WHERE DATE(T) BETWEEN :dtlow AND :dthigh

That's because DATE() must be applied to each row to convert it for the comparisons, which basically means no indexing on T. That has to happen because T is not constant across all rows.

Now consider:

SELECT *
FROM MYTAB
WHERE T BETWEEN TIMESTAMP(:dtlow,'00.00.00') AND TIMESTAMP(:dthigh,'24.00.00')

The variables dtlow and dthigh are constant for the life of the query. Therefore, the TIMESTAMP functions can be pre-folded, which means this is equivalent to T between constant times. DB2 considers that eminently indexable, and efficiency will be high.

(As an aside, though, guess which form is preferred by my coworkers.)

I will keep this in mind. I think it will help as we are using a lot of date ranges.

@Weng
So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?

@Weng
So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?

@Weng
So, then, what's the right way to accomplish things, if you need to basically accomplish a CASE statement, and then filter on the results of that statement? Or are you just SOL from a performance standpoint?

A case in the WHERE clause is pretty much always going to perform badly. But sometimes you can work around that with UNION, especially if you can do UNION ALL. Consider a stupid example:

WHERE
LIM < CASE
WHEN REVERSE='Y' THEN -AMT
ELSE AMT
END

Since a row is either REVERSE='Y' or not, this would be a candidate for UNION ALL:

Hm, yeah, I'm kind of doing that, I guess a lot of my problem with the queries i'm working on comes from the fact that my function is basically going out to one or more different tables (some remote) to try to translate a friendly name into an actual server or vice versa. So, the lack of being able to filter before applying the (multiple) functions is really fugly for performance

Hm, yeah, I'm kind of doing that, I guess a lot of my problem with the queries i'm working on comes from the fact that my function is basically going out to one or more different tables (some remote) to try to translate a friendly name into an actual server or vice versa. So, the lack of being able to filter before applying the (multiple) functions is really fugly for performance

Well, YMMV, but if the function is foldable (which requires it be deterministic) and the engine is "paying attention" then it still should work.

Consider my TIMESTAMP() function above: it is deterministic. Since the input is constant for the life of the query, and the same input yields the same output always, then the function can be prefolded.

Addendum: Deterministic means given input yields given output for the life of the query: it's a contract that the engine is allowed to assume it doesn't have to be recomputed each row. It sounds like your functions should meet that requirement...unless you routinely flip servers in mid query. So I would declare the functions deterministic.

Hm, yeah, I think that sounds right. Basically the functions are parsing the result of column A to pull a couple pieces of the connection string out, then referencing a local table to see if that server name makes sense, and if not then it looks at a remote table that's more complete (I probably should just pull a cache copy of the remote table, tbh, since it doesn't change that often...)

Eh, it's functional and it's a temporary thing and it's not impacting operations, but it makes me cry a little on the inside to see 30+ second execution times with hundreds of remote server executions.

@Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor.

That was my first thought too.

Why not set up a quick build system in the repository where you store SQL code, and use something like ejs to easily crank out may similar SP-s?

That way you get DRY, without compromising efficiency or having too many strange internal utilities inside a database.

@Karla Also, if you invent a kind of preprocessor toolchain with macros to facilitate SQL reuse, this might be cute and clever, but the next person to maintain the shit will hate you. Better master the power of your editor.

That was my first thought too.

Why not set up a quick build system in the repository where you store SQL code, and use something like ejs to easily crank out may similar SP-s?

That way you get DRY, without compromising efficiency or having too many strange internal utilities inside a database.

If you include a function in the WHERE clause along with several other conditions, too fucking bad, it's still going to execute that function on every goddamned row, including the ones excluded by the other conditions. Oh, and the overhead for calling a function is INSAAAAAAAAAAAANE.

This is where things like CTEs come in handy, because you can do that initial filtering and then call the function in a later CTE that uses the original CTE as a source.

I've also been bitten multiple times by views screwing over my performance. I hate the repetition, but not as much as the slow performance.

@Karla while i make liberal use of view and stored procedure, be very careful when introducing them. They create chains that the query optimizer has difficulty with and can drastically slow down your query time, and can create a difficult chain to follow if you start going deep. Function tend to also have awful performance in the long run.

Consider creating stored procs that accept variables to define what type of query they run, so you can query as little as possible to keep things snappy. Minimize joins in your views whenever possible. Conditional if statements mostly only work in procs, views tend to suck for that.

It can be any templating language, really, EJS is just my favourite. It's pretty straightforward and fits well with a js-based build system.

Like most templating languages, EJS was meant to be used with html. I never actually did this in practice, but I don't see why it couldn't work with SQL too.

It would involve setting up a frontend-like build system - grunt or gulp. Except, instead of javascripts and CSS, you would convert your source procedure.sql.ejs files into procedure_v1sql, procedure_v2.sql etc.

@cartman82 I was joking .. I think Jade is shit because the white space can break it.
EDIT: before any pythonistas come for me it isn't obviously mentioned when you are using something like express as a framework.

I did some things in jade. It's ok for little html things. But in the end, whatever gains you make by not having to type as much are lost as you're googling every 5 minutes how to achieve this or that.

Maybe if I was working in it ALL the time... But probably not worth it.

@Karla while i make liberal use of view and stored procedure, be very careful when introducing them. They create chains that the query optimizer has difficulty with and can drastically slow down your query time, and can create a difficult chain to follow if you start going deep. Function tend to also have awful performance in the long run.

Consider creating stored procs that accept variables to define what type of query they run, so you can query as little as possible to keep things snappy. Minimize joins in your views whenever possible. Conditional if statements mostly only work in procs, views tend to suck for that.

It can be any templating language, really, EJS is just my favourite. It's pretty straightforward and fits well with a js-based build system.

Like most templating languages, EJS was meant to be used with html. I never actually did this in practice, but I don't see why it couldn't work with SQL too.

It would involve setting up a frontend-like build system - grunt or gulp. Except, instead of javascripts and CSS, you would convert your source procedure.sql.ejs files into procedure_v1sql, procedure_v2.sql etc.

I know ColdFusion** (and the webpart is that).
I have generated SQL (and other CF files) with it before.

Also, I remembered when CTEs started to be a thing in PostgreSQL, blogs were all like, "be cautious! CTEs are an optimization fence in PostgreSQL!". They still are. PostgreSQL still executes CTEs separately from everything else. And yet I did SQL that was orders of magnitude faster with CTEs than without. Apparently, querying against a large dataset, making a small joinable table that fits in the RAM, and then running the main query against that did the trick, as opposed to doing it all at once.

I know this is not how it should be done. The query planner, in the end, should be smart enough to deduce such stuff. I don't have to estimate that in 99% of the cases I'll only need 100 records from that billion-record table. In the end, CTEs should only be syntactic sugar for added readability (except maybe WITH RECURSIVE). But the reality is what it is, and abstractions still leak.