New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.

I’m pretty sure you’ve gotten it wrong in numerous ways, so far. And you probably won’t get it right any time soon. So why waste your precious time on SQL tweaking, when you could be implementing business logic?

Let me explain…

It hasn’t been until the recent SQL:2008 standard that what MySQL users know as LIMIT .. OFFSET was standardised into the following simple statement:

SELECT *
FROM BOOK
OFFSET 2 ROWS
FETCH NEXT 1 ROWS ONLY

Yes. So many keywords.

SQL is indeed a very verbose language. Personally, we really like the conciseness of MySQL’s / PostgreSQL’s LIMIT .. OFFSET clause, which is why we chose that for the jOOQ DSL API

In SQL:

SELECT * FROM BOOK LIMIT 1 OFFSET 2

In jOOQ:

select().from(BOOK).limit(1).offset(2);

Now, when you’re a SQL framework vendor, or when you’re rolling your own, in-house SQL abstraction, you might think about standardising this neat little clause. Here’s a couple of flavours from databases that natively support offset pagination:

So far, so good. These can all be handled. Some databases put offsets before limits, others put limits before offsets, and the T-SQL family puts the whole TOP clause before the SELECT list. This is easy to emulate. Now what about:

So, that’s more like it, right? But since you’re not writing this manually every time, you’re about to start creating your own nifty in-house SQL framework covering the 2-3 use cases that you’ve encountered so far, right?

You can do it. So you’ll regex-search-replace column names automagically to produce the above.

So now, it is correct?

Of course not! Because you can have ambiguous column names in top-level SELECTs, but not in nested selects. What if you want to do this:

Nope. And the trick of manually patching ID columns from the previous example doesn’t work, because you have multiple ID instances. And renaming the columns to random values is nasty, because the user of your home-grown in-house database framework wants to receive well-defined column names. I.e. ID and… ID.

So, the solution is to rename the columns twice. Once in each derived table:

But now, we’re done?

Of course not! What if you doubly nest such a query? Will you think about doubly renaming ID columns to synthetic names, and back? … ;-) Let’s leave it here and talk about something entirely different:

Does the same thing work for SQL Server 2008?

Of course not! In SQL Server 2008, the most popular approach is to use window functions. Namely, ROW_NUMBER(). So, let’s consider:

So that’s it, right?

Of course not! ;-)

OK, we’ve already had this issue. We should not select *, because that would generate too many columns in the case that we’re using this as a subquery for an IN predicate. So let’s consider the correct solution with synthetic column names:

Now, that’s correct SQL according to SQL Server, although you do not have a guarantee that the ordering of the derived table will survive after query execution. It may well be that the ordering is changed again by some influence.

If you wanted to order by SOME_COLUMN in the outer query, you’d have to again transform the SQL statement to add another synthetic column:

Now, what happens if an author has written several books? Yes, the DISTINCT keyword should remove such duplicates, and effectively, the PostgreSQL query will correctly remove duplicates first, and then apply LIMIT and OFFSET.

However, the ROW_NUMBER() predicate always generates distinct row numbers beforeDISTINCT can remove them again. In other words, DISTINCT has no effect.

Watch out that the ORDER BY clause must contain all columns from the SELECT field list. Obviously, this will limit the acceptable columns in the SELECT DISTINCT field list to columns that are allowed in a window function’s ORDER BY clause (e.g. no other window functions).

We could of course try to fix that as well using common table expressions, or we consider

Yet another issue??

Yes, of course!

Do you even know what the column(s) in the window function’s ORDER BY clause should be? Have you just picked any column, at random? What if that column doesn’t have an index on it, will your window function still perform?

The answer is easy when your original SELECT statement also has an ORDER BY clause, then you should probably take that one (plus all the columns from the SELECT DISTINCT clause if applicable).

Are we done yet!?!?

Probably not ;-) But we have probably covered around 99% of the common and edge cases. We can sleep nicely, now.

Note that all of these SQL transformations are implemented in jOOQ. jOOQ is the only SQL abstraction framework that takes SQL seriously (with all its warts and caveats), standardising over all of this madness.

With jOOQ, you can just write your Oracle SQL or Transact SQL as if it were as awesome as PostgreSQL! … without jumping the SQL ship entirely, and moving on to JPA.

Keyset paging

Now, of course, if you have been reading our blog, or our partner blog SQL Performance Explained, you should know by now that OFFSET pagination is often a bad choice in the first place. You should know that keyset pagination almost always outperforms OFFSET pagination.

New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.