Amazing optimization of getting distinct values from the index, and TopN for each of them

A couple of days ago someone posted a question on the forum which at the first glance seemed old, boring, beaten up and down. Here it is:

There is a news feed. All news are divided into 10 categories (Politics, sport, auto, real estate, etc).
I need to get top 4 news sorted by time descending for each category with 1 query.
If you sort the results – you get 4 politics news, then 4 sport news etc.

But the task was to make it optimal, and the standard solution with usual TopN using row_number can not be called optimal in any way, especially in case of big tables, relatively small number of categories and uneven distribution or just overall low selectivity.

So after getting several more or less acceptable variants, and spotting the solution using PostgreSQL (I didn’t dig deep into it, as I saw recursion, min and predicate, and that was enough), I got a great variant.

We also could go along the tree entering only the required blocks, but not all leaf blocks! However, Oracle can’t manage this on its own so we have to make a certain twist: aside from IFS(min/max) Oracle also has IRS(min/max) which works well with ranges and boundaries. We can use recursive query to make it read only what we need!

The difference is obvious: 36 consistent gets for 30 values, instead of 135. Note that this is a very small table, and we will have а much notable difference for millions and billions of entries!
Here is the explanation of the algorithm:

In the first part of union all (3-4 strings of plan) we specify where to start the recursion, and more specifically we choose a minimal (first) the value from the index.

After that we choose the first value that is bigger than the one chosen in the previous step, using IRS(min/max) (7-6-5 stings of the plan).

Repeat the recursion while we find anything

Proceed to the next:

2. TopN entries for every key value
Now as we are armed with an easy tool to get every initial value, we can easily get Top N for each of them. The only problem that remains is that, we can not use inline view with row_number/rownum, as the predicate from higher level won’t be pushed there, and we will have to use simple restriction by count stop key (by rownum) with required access by IRS descending (order by is generally unnecessary there, but it further reduces reading costs of IRS descending, which is necessary for implicit sorting) with the index_desc hint, to nail it dead, otherwise sorting may break. So to make this happen we either have to use an undocumented Lateral() with a corresponding event turned on, or use a simpler and standard table(multiset(…)) or a little harder with xmltable() – but it is not so dangerous. Yet another variant is the use cursor() with pushed predicates:

With cursor()

with t_unique( a ) as (
select min(t1.a)
from xt_test t1
union all
select (select min(t1.a) from xt_test t1 where t1.a>t.a)
from t_unique t
where a is not null
)
select cursor(
select rid from(
select/*+ index_desc(tt ix_xt_test_ab) */
tt.a
,tt.rowid rid
,row_number()over(partition by a order by b desc) rn
from xt_test tt
order by tt.b desc
)
where a=v.a and rn<=5
)
from t_unique v

alter session set events '22829 trace name context forever';
with t_unique( a ) as (
select min(t1.a)
from xt_test t1
union all
select (select min(t1.a) from xt_test t1 where t1.a>t.a)
from t_unique t
where a is not null
)
select/*+ use_nl(rids tt) */ *
from t_unique v
,lateral(
select/*+ index_desc(tt ix_xt_test_ab) */ tt.*
from xt_test tt
where tt.a=v.a
and rownum<=5
order by tt.a, b desc
) r
order by r.a,r.b desc

[collapse]

In general, we could do without the dangerous sorting, using “xmltable” and dbms_xmlgen instead of “table” sending a parameter directly to the internal subquery, but this is a bit harder than the regular ”table”