Alex R recently discovered interesting thing: in SQL pipelined functions work much faster than simple non-pipelined table functions, so if you already have simple non-pipelined table function and want to get its results in sql (select * from table(fff)), it’s much better to create another pipelined function which will get and return its results through PIPE ROW().

A bit more details:

Assume we need to return collection “RESULT” from PL/SQL function into SQL query “select * from table(function_F(…))”.
If we create 2 similar functions: pipelined f_pipe and simple non-pipelined f_non_pipe,

we’ll find that the function with simple “return result” works at least twice slower than pipelined function:

Function

1 000 000 elements

100 000 elements

F_PIPE

2.46

0.20

F_NON_PIPE

4.39

0.44

F_PIPE_FOR_NONPIPE

2.61

0.26

F_UDF_PIPE

2.06

0.20

F_UDF_NON_PIPE

4.46

0.44

I was really surprised that even “COLLECTION ITERATOR PICKLER FETCH” with F_PIPE_FOR_NONPIPE that gets result of F_NON_PIPE and returns it through PIPE ROW() works almost twice faster than F_NON_PIPE, so I decided to analyze it using stapflame by Frits Hoogland.

I added “dbms_lock.sleep(1)” into both of these function after collection generation, to compare the difference only between “pipe row” in loop and “return result”:

And stapflame showed that almost all overhead was consumed by the function “kgmpoa_Assign_Out_Arguments”:
I don’t know what this function is doing exactly, but we can see that oracle assign collection a bit later.
From other functions in this stack(pmucpkl, kopp2isize, kopp2colsize, kopp2atsize(attribute?), kopuadt) I suspect that is some type of preprocessiong of return arguments.What do you think about it?

But it works fine in any of the following cases:
* replace 268435456 with power(2, 28), or
* replace 268435456 with small literal like 268, or
* move this function to the beginning of package body

The one of the interesting findings was that the returned value is equal to the one of literals in another function.
We can reproduce this bug even with an anonymous pl/sql block. The following test case uses 32768 integer literals from 1000001 to 1032768 and prints 5 other integers:

This test case well demonstrates wrong results:
* instead of 100000 we get 10000001, which is the value from first line after “begin”, ie 1st integer literal in the code,
* for 32766 and 32767 oracle returns right values
* instead of 32768 (==32767+1) it returns 10000002, which is the integer from 2nd line, ie 2nd integer literal in the code,
* instead of 32769 (==32767+2) it returns 10000003, which is the integer from 3rd line, ie 3rd integer literal in the code
After several tests I can make a conclusion:

It doesn’t matter what plsql_optimize_level or plsql_code_type you set, was debug enabled or not, the behaviour is the same.

It seems that this is a kind of PL/SQL optimization: during parsing, oracle leaves integer literal in place if its value is in range -32768..32767 (16bit signed int), but if its value is out of this range, oracle adds this value into array of integers’ constants and replaces the value with the index of this element in this array. But because of index value overflow in cases when a count of such integer literals becomes larger than 32768, instead of Nth element of this array, oracle returns Mth element, where M is mod(N,32767).

Unfortunately associative arrays still require more “coding”:
we still can’t use “indices of” or “values of” in simple FOR(though they are available for FORALL for a long time), don’t have convinient iterators and even function to get all keys…
That’s why I want to show my templates for such things like iterator and keys function. You can adopt these functions and create them on schema level.

I see quite often when developers ask questions about connected components:

Table “MESSAGES” contains fields “SENDER” and “RECIPIENT”, which store clients id.
How to quickly get all groups of clients who are connected even through other clients if the table has X million rows?
So for this table, there should be 4 groups:

(1, 2, 4, 8, 16)

(3, 6, 12)

(5, 10, 20)

(7, 14)

(9, 18)

SENDER

RECIPIENT

1

2

2

4

3

6

4

8

5

10

6

12

7

14

8

16

9

18

10

20

Of course, we can solve this problem using SQL only (model, recursive subquery factoring or connect by with nocycle), but such solutions will be too slow for huge tables.

In such situations it’s much better to adopt standard algorithms like Quick-find or Weighted quick-union for PL/SQL.
The first time I wrote such solution about 5 years ago and I even posted here one of the latest solutions, but all of them were not universal, so I’ve created the package today with a couple of functions for most common problems: XT_CONNECTED_COMPONENTS

It contains 2 functions based on Weighted quick-find quick-union algorithm:

function get_strings(cur in sys_refcursor, delim varchar2:=’,’) return strings_array pipelined;
It accepts a cursor and returns found connected components as table of varchar2(v_size). You can change v_size in the package definition.
Input cursor should contain one Varchar2 column with linked strings, for example: ‘a,b,c’.
You can also specify list delimiter, by default it is comma.
Examples:

How to install:
Download all files from Github and execute “@install” in SQL*Plus or execute them in another tool in the following order:xt_connected_components_types.sql
xt_connected_components_pkg.sql
xt_connected_components_bdy.sql

But we all know that “Every rule has an exception”
There are many different cases when pl/sql with sql can be more efficient than only sql, and i dont want to catalog them. I just want to show a couple examples of such exceptions:

1. Running totals by several dimensions

select dt,
dim1,
dim2,
val,
sum(val) over(partition by dim1 order by dt) dim1_cumulative_sum,
sum(val) over(partition by dim2 order by dt) dim2_cumulative_sum,
sum(val) over(partition by dim1, dim2 order by dt) dim1_dim2_cumulative_sum
from mg_t
order by dt;

This query will be very hard for big data sets, so we can do it efficiently with pl/sql:

This pure SQL solution is for the cases when ClientId and AccountId are different entities. If they are the same entities in your case, you need to use UNION ALL:
select min(group_member_id) as group_max_id, accountid, clientid
from (select clientid as group_member_id
, connect_by_root accountid as accountid
, connect_by_root clientid as clientid
from test
connect by nocycle decode(accountid, prior accountid, 1, 0)
+ decode(clientid, prior clientid, 1, 0)
= 1
) a
group by accountid, clientid
order by group_max_id, accountid
/

Now you see that inner view was merged, and the function was executed 20 times in the filter and 6 times on the fetch after filtering(6 rows – 6 calls).
I see that often in such cases “no_merge” hint is suggested, but let’s test it:

Now you see, that main problem is the “filter pushdown” transformation. Previously, if we were to disable the “filter pushdown” operation, we had to use a variety of tricks, such as “materialize” hint or adding the “rownum” in expession, etc. But all these solutions require rewriting the query.
But from 11.2.0.3 we can use “_optimizer_filter_pushdown” parameter, for example:

But since 11g Oracle also can optimize code with deterministic functions too. For this to happen, code must meet the following conditions:
1. PLSQL_OPTIMIZE_LEVEL greater or equal 2
2. Parameters should not be changed in the loop body. Strictly speaking, there should not be any assignment of parameters.
3. Should not be any implicit conversions of parameters in function call, i.e. all variables/literals must be the same type as input parameters declared.
4. Should not be any call of non-deterministic functions (except some standard sql functions like to_date, to_char, nvl) or procedures in the loop

Note that this rules concerns only same scope level as this loop and not inner loops or another blocks.