July 10, 2013

12c First N

There have been a couple of nice posts about the “Top N” (or First N / Next N) syntax that has appeared in 12c, here and here, for example. I particularly like the first set of examples because they include some execution plans that give you a good idea of what’s going on under the covers. “Under the covers” is important, because if you don’t actually have a large data set to test on you might not realise what impact a “Top N” query might have on a production data set.

You can do just a little better than inference from the execution plan, so I thought I’d push the envelope slightly with a small example and appeal (once again, sorry Doug) to the 10053 trace file. Here’s my test query, with resulting execution plan:

Note that I’ve asked Oracle to skip the first ten rows then report the next 1 percent of the data – based on a given ordering – but to include any rows beyond the 1 percent where the ordering values still match the last row of the 1 percent (Tim Hall’s post includes an example showing the difference between “with ties” and “rows only”).

Line 2 of the execution plan is very revealing – we have a “Window Sort” operation – which tells you that the optimizer has to be using some sort of analytic function to get the result set.

So here’s the “Unparsed Query” (extracted with a bit of re-formatting from the 10053 trace file) that Oracle generated from the query:

As expected, we can see the query has been transformed to include analytic functions – three of them. Oracle is basically taking a nice, easily readable syntax, and converting it behind the scenes to the sort of SQL we would once have written by hand. (That’s a good thing, of course – clarity of code does protect you from accidents, and it’s best if messy rewrites are handled by machines rather than error-prone people).

Although we have three analytic functions, we need only one window sort since the count(*) is over() the whole data set (no partitioning or ordering required), and the rank() and row_number() functions are partitioned and ordered by the same expression. Nevertheless, behind the scenes, my particular query has had to sort the entire data set to get the 1% that I wanted.

Footnote: the rank() function appears because of the “with ties” option. If we had chosen the “rows only” option Oracle wouldn’t have needed to discover rows with the same rank() as the last of the 1 percent.

Note that you don’t need to gather stats – 12c does that automatically on CTAS and “insert as select” into an empty table – though the stats collection doesn’t include histograms (rarely a great loss) or index stats.

Update – Oct 2013

In response to a comment below, it’s worth mentioning that some simple forms of the “fetch first” can use the operation “WINDOW SORT PUSHED RANK”, this operation (like SORT ORDER BY STOPKEY) is able to limit the data volume sorted, but does the older, less efficient, “Version 1″ sort to achieve this. If you don’t have the PUSHED RANK (or STOPKEY) then I believe the entire incoming rowsource has to be sorted, but Oracle will be using the newer (10g), more efficient, “Version 2″ sort.

Got it. I created the table in SYS schema and that is the reason it has not gathered stats. Following limitations appy for online statistics gathering for bulk loads.

Currently, statistics gathering does not happen for bulk load statements when any of the following conditions apply to the target table
It is in an Oracle-owned schema such as SYS.
It is a nested table.
It is an index-organized table (IOT).
It is an external table.
It is a global temporary table defined as ON COMMIT DELETE ROWS.
It has virtual columns.
It has a PUBLISH preference set to FALSE.
It is partitioned, INCREMENTAL is set to true, and extended syntax is not used

Let me reiterate: the problem is with passing “NEXT N” as a variable and not as a constant (5 in your case) – it is a bug #17404511 (I logged it later than made this comment here). Just checked its status on MOS – it had not been delivered in 12.1.0.2 (scheduled for 12.2)

Another update. For unknown reasons I entered #17404511 in the MOS search – and it came back to me with the pointer to “12.1.0.2 Patch Set – List of Bug Fixes by Problem Type (Doc ID 1683802.1)” – it seems to be that the mentioned bug is listed under the category “Undocumented Oracle Server”. H-m-m-m… Maybe they’ve pushed a fix earlier than expected and just didn’t update the bug itself? I don’t have 12.1.0.2 installed yet, but maybe somebody can do a quick check and post results?

Curiously the execution plan produced by the new row limiter is different to the previous “SORT ORDER BY STOPKEY” method. I wonder which one is faster, the old method, or the analytic functions. Answer: The old method by a hair.

WordPress messed up your example because of the “less than” symbols. I think I’ve fixed it up correctly.

I’m slightly surprised that you see such a difference in your example: 3/100 of a second is quite large if that’s a consistent difference over such a small amount of work. In your example Oracle is able to use WINDOW SORT PUSHED RANK for the 12c variant, so both versions of the SQL need only sort a very small part of the data, and the 10032 trace shows that the both use a version 1 sort, with the same number of comparisons and output rows (See this post: https://jonathanlewis.wordpress.com/2009/12/28/short-sorts/ ) . I guess that means the generic strategy for the analytic approach is the main culprit.

It’s not a hint, it’s a piece of text that I can use to search v$sql if I need to. I tend to use the hint form (rather than the comment form) because some platforms strip out comments before passing the SQL to the database.