Networks or hierarchies of arbitrary depth are difficult to traverse in SQL without using recursion. However, there also exist hierarchies of fixed and fairly small depths, and these can be traversed either recursively or by a sequence of joins for each of the levels. In this article I compare the performance characteristics of three traversal methods, two recursive and one non-recursive, using my own benchmarking package (A Framework for Dimensional Benchmarking of SQL Performance), on a test problem of a fixed level organization structure hierarchy, with 5 levels for performance testing and 3 levels for functional testing.

The three queries tested were:

JNS_QRY: Sequence of joins

PLF_QRY: Recursive pipelined function

RSF_QRY: Recursive subquery factors

Fixed Level Hierarchy Problem Definition

A hierarchy is assumed in which there are a number of root records, and at each level a parent can have multiple child records and a child can also have multiple parents. Each level in the hierarchy corresponds to an entity of a particular type. Each parent-child record is associated with a numerical factor, and the products of these propagate down the levels.

The problem considered is to report all root/leaf combinations with their associated products. There may of course be multiple paths between any root and leaf, and in a real world example one would likely want to aggregate. However, in order to keep it simple and focus on the traversal performance, I do not perform any aggregation.

To simplify functional validation a 3-level hierarchy was taken, with a relatively small number of records. The functional test data were generated by the same automated approach used for performance testing. The fact number was obtained as a random number betwee 0 and 1, and to keep it simple, duplicate pairs were permitted.

The test data were parametrised by width and depth as follows (the exact logic is a little complicated, but can be seen in the code itself):

Width corresponds to a percentage increase in the number of child entities relative to the number of parents

Depth corresponds to the proportion of the parent entity records a child is (randomly) assigned. Each child has a minimum of 1 parent (lowest depth), and a maximum of all parent entities (highest depth)

It is interesting to note that all joins in the execution plan are hash joins, and in the sequence you would expect. The first three are in the default join 'sub-order' that defines whether the joined table or the prior rowset (the default) is used to form the hash table, while the last two are in the reverse order, corresponding to the swap_join_inputs hint. I wrote a short note on that subject, A Note on Oracle Join Orders and Hints, last year, and have now written an article using the largest data point in the current problem to explore performance variation across the possible sub-orders.

For simplicity a stand-alone database function was used here. The query execution plan was obtained by the benchmarking framework and the highest data point plan listed. The query within the function was extracted and an explain Plan performed manually, which showed the expected index range scan.

JNS_QRY is faster than RSF_QRY, which is faster than PLF_QRY at all data points

PLF_QRY tracks the number of output records very closely. This is likely because the function executes a query at every node in the hierarchy that uses an indexed search.

The pure SQL methods scale better through being able to do full table scans, and avoiding multiple query executions

Deep Slice Elapsed - CPU Times

The elapsed time minus the CPU times are shown in the first graph below, followed by the disk writes. The disk writes (and reads) are computed as the maximum values across the explain plan at the given data point, and are obtained from the system view v$sql_plan_statistics_all. The benchmarking framework gathers these and other statistics automatically.

The graphs show how the elapsed time minus CPU times track the disk accesses reasonably well

RSF_QRY does nearly twice as much disk writes as the other two

Wide Slice Results [w=180, d=(100, 120, 140, 160, 180)]

The performance characteristics of the three methods across the wide slice data points are pretty similar to those across the deep slice. The graphs are shown below.

Conclusions

For the example problem taken, the most efficient way to traverse fixed-level hierarchies is by a sequence of joins

Recursive methods are significantly worse, and the recursive function is especially inefficient because it performs large numbers of query executions using indexed searches, instead of full scans

I noticed an interesting thread on OTN recently, Matching ( in a string. It's about using SQL to find matching bracket pairs (technically 'parentheses' but 'brackets' is shorter, and it makes no difference to the SQL). Incidentally, I found recently that nested bracket expressions are a nice way of compactly representing the structure of complex hash join execution plans, A Note on Oracle Join Orders and Hints.

I thought it would be interesting to run some of the solution queries through my benchmarking package to test performance (A Framework for Dimensional Benchmarking of SQL Performance). I decided to consider only the queries that addressed multiple records, and the form of the problem that requires returning record uid, opening and closing bracket positions, plus the substring enclosed. These were by 'mathguy' and 'Peter vd Zwan', and I made very minor tweaks for consistency. I also wrote a query myself using PL/SQL in an inline SQL function using the new (v12.1) 'WITH Function' functionality, and copied this to a version using a pipelined database function to check for any performance differences. The four queries tested were then:

CBL_QRY, mathguy: Connect By, Analytics, Regex

MRB_QRY, Peter vd Zwan: Connect By, Match_Recognize

WFB_QRY, me: With PL/SQL Function, Arrays

PFB_QRY, me: Pipelined PL/SQL Function, Arrays

Bracket Pair Definition

Consider a function (BrDiff) defined at each character position as the difference between the number of opening and closing brackets to the left of, or at, that position.

A closing bracket closes an opening bracket if it is the first closing bracket where BrDiff is 1 less than BrDiff at the opening bracket. If all brackets are in some pair, then the expression can be considered well-formed.

This can be illustrated with a diagram for the fourth functional example below.

Test Problem

BRACKET_STRINGS Table

CREATE TABLE bracket_strings (id NUMBER, str VARCHAR2(4000))
/

Functional Test Data

I took four of mathguy's test records, excluding the (deliberately) badly-formed strings, and which included some embedded returns:

Each test set consisted of 100 records with the str column containing the brackets expression dependent on width (w) and depth (d) parameters, as follows:

Each str column contains w bracket pairs

The str column begins with a 3-character record number

After the record number, the str column begins with d opening brackets with 3 characters of text, like: '(001', etc., followed by the d closing brackets, then the remaining w-d pairs in an unnested sequence, like '(001)'

When w=d the pairs are fully nested, and when d=0 there is no nesting, just a sequence of '(123)' stringss.

This choice of test data sets allows us to see if both number of brackets, and bracket nesting have any effect on performance.

The output from the test queries therefore consists of 100*w records with a record identifier and a bracketed string. For performance testing purposes the benchmarking framework writes the results to a file in csv format, while counting only the query steps in the query timing results.

All the queries showed strong time correlation with width, with smaller correlation with depth.

CBL_QRY is significantly faster than MRB_QRY, which appears to be rising quadratically

Both WFB_QRY and PFB_QRY are much faster than the non-PL/SQL queries

PFB_QRY is slightly faster than WFB_QRY. This could be regarded as too small a difference to be significant, but is consistent across the data points, despite the context switching with database function calls

In March 2013 I wrote an article on the use of SQL to group network-structured records into their distinct connected subnetworks, SQL for Network Grouping. I looked at two solution approaches commonly put forward on Oracle forums for these types of problem, using Oracle's Connect By recursion, and the more recent recursive subquery factoring, and also put forward a new solution of my own using the Model clause. I noted however that SQL solutions are generally very inefficent compared with a good PL/SQL solution, such as I posted here, PL/SQL Pipelined Function for Network Analysis. For the first two methods, I noted:

Non-hierarchical networks have no root nodes, so the traversal needs to be repeated from every node in the network set

Hierarchical queries retrieve all possible routes through a network

I also noted that Connect By is more inefficient than recursive subquery factoring, but did not say why, promising a more detailed explanation at a later date. In this article I illustrate the behaviour of both recursive SQL methods through a series of five elementary networks, followed by a simple combination of the five. I then use the foreign key network from Oracle's HR demo (v12 version, with OE and PM schemas included) as a final example.

In this article I consider traversal of a single connected network from a given root node (or several if each root node is specified).

It is shown that the behaviour of Connect By can be understood best by considering it to traverse all paths through a network that is dual to the original network.

Dual Networks

Dual network definition

The dual network consists of a set of nodes and links (d-nodes and d-links say) defined thus:

the d-nodes correspond to each link in the original network that is adjacent (via a node) to at least one other link, including itself if its start and end nodes are the same

the d-links correspond to each pair of adjacent links where the 'from' link identifier is alphabetically smaller than that of the 'to' link, except for the case of links that are adjacent to themselves where a single d-link has the same 'from' and 'to' link

Dual network SQL

The d-node identifiers are just the link identifiers, while the d-link identifiers use the adjacency-defining node identifiers with a sequential number (partitioned by node) attached.

Dual networks defined as above are generally larger than the original networks and are usually more heavily looped, which explains the inferior performance of Connect by compared with recursive subquery factor solutions. The PL/SQL solution mentioned above, while traversing the entire network, does not traverse all possible routes through it and its performance is thus not adversely affected by the degree of looping.

SQL Queries

The recursive SQL queries return all routes through the network from the roots supplied. In my attached script I also have versions that filter out repeated links. The pipelined function query returns a single, exhaustive route through the network, distinguishing a set of tree links from loop-closing links; it also returns all subnetworks without requiring input roots.

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor. Otherwise it returns 0

Connect By queries do not return loop-closing nodes, and the prior node is marked as the cycle node.

Recursive Subquery Factor Cycles

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

Recursive Subquery Factor queries do return loop-closing nodes, and these nodes are marked as the cycle nodes.

We will see this differing behaviour clearly in the following examples. We will also see that the Connect By output on the original network has exactly the same structure as recursive subquery factor output on the dual network if the loop-closing rows are disregarded. Cycle nodes on both definitions are marked with a '*' in the outputs below.

In the output below I deleted all the loop rows from the RSF output for the dual network and placed the result beside the output for CBY for the original network, using a column-wise copy and paste. It's easy to see then their equivalent structure. Both have 69 rows.

Neither of the two SQL recursion methods completed within a period of an hour and had to be terminated. The result for CBY on the original network suggests that RSF on the dual network should return somewhere above 4,414,420 rows.

Conclusions

We have shown by examples how network traversal by the Connect By (CBY) approach in SQL corresponds to traversal of all routes in a type of dual version of the original network

This dual version, which has forks converted to loops, tends to be larger and more heavily looped, resulting in worse performance compared with solution by recursive subquery factors (RSF)

The examples illustrate the different treatment of loop-closing links between the two types of SQL recursion

The RSF solutions on the dual network in the simpler examples where it completes is seen to be equivalent to the CBY solution on the original network, after allowing for the different treatment of loop-closing links

In March 2013 I wrote this SQL for Network Grouping, describing some options in SQL for solving a general class of network problems, while noting that these would have serious performance issues, and that the most efficient approach would involve PL/SQL. I had already published a Scribd article, in June 20101, on using PL/SQL to traverse a single connected network, An Oracle Network Traversal PL SQL Program, but had not at that time extended the approach to cover all networks.

This network comes from Oracles foreign key constraints as specified in it's ALL_CONSTRAINTS view. I copied the constraints of type foreign key from the view into my own table, and created primary key and indexes that I deemed appropriate, and gathered statistics on the table.

Output for Foreign Key Network
The detailed output with 815 records took 1.5 seconds, while the two summary outputs took a fraction of a second each.

Diagram of a Foreign Key Subnetwork
This diagram shows the trajectory that the algorithm took through the subnetwork of HR tables that includes the COUNTRIES table, with tree links in red, and loop closing links in blue. It may help to understand the working of the algorithm.

The page describes the network as having 58,228 nodes and 214,078 links, but the data set has 428,156 links, with each pair of nodes that is linked having links provided in both directions. My package traverses links in either direction so I did not require the second link, and copied only one of the links into a table for the network analysis. I created primary key and indexes that I deemed appropriate and gathered statistics on the table.

Output for Brightkite Network
The detailed output with 214,625 records took 103 seconds, while the two summary outputs took 27 and 23 seconds. Most of the detailed execution time is of course due to the writing of the records to file. The output is too large to embed in full, so I cut out most of the detailed output.

I noticed a post on AskTom recently that referred to an SQL solution to a version of the so-called Bin Fitting problem, where even distribution is the aim. The solution, How do I solve a Bin Fitting problem in an SQL statement?, uses Oracle's Model clause, and, as the poster of the link observed, has the drawback that the number of bins is embedded in the query structure. I thought it might be interesting to find solutions without that drawback, so that the number of bins could be passed to the query as a bind variable. I came up with three solutions using different techniques, starting here.

An interesting article in American Scientist, The Easiest Hard Problem, notes that the problem is NP-complete, or certifiably hard, but that simple greedy heuristics often produce a good solution, including one used by schoolboys to pick football teams. The article uses the more descriptive term for the problem of balanced number partitioning, and notes some practical applications. The Model clause solution implements a multiple-bin version of the main Greedy Algorithm, while my non-Model SQL solutions implement variants of it that allow other techniques to be used, one of which is very simple and fast: this implements the team picking heuristic for multiple teams.

Another poster, Stew Ashton, suggested a simple change to my Model solution that improved performance, and I use this modified version here. He also suggested that using PL/SQL might be faster, and I have added my own simple PL/SQL implementation of the Greedy Algorithm, as well as a second version of the recursive subquery factoring solution that performs better than the first.

This article explains the solutions, considers two simple examples to illustrate them, and reports on performance testing across dimensions of number of items and number of bins. These show that the solutions exhibit either linear or quadratic variation in execution time with number of items, and some methods are sensitive to the number of bins while others are not.

After I had posted my solutions on the AskTom thread, I came across a thread on OTN, need help to resolve this issue, that requested a solution to a form of bin fitting problem where the bins have fixed capacity and the number of bins required must be determined. I realised that my solutions could easily be extended to add that feature, and posted extended versions of two of the solutions there. I have added a section here for this.

Updated, 5 June 2013: added Model and RSF diagrams

Greedy Algorithm Variants

Say there are N bins and M items.

Greedy Algorithm (GDY)
Set bin sizes zero
Loop over items in descending order of size

Add item to current smallest bin

Calculate new bin size

End Loop

Greedy Algorithm with Batched Rebalancing (GBR)
Set bin sizes zero
Loop over items in descending order of size in batches of N items

Assign batch to N bins, with bins in ascending order of size

Calculate new bin sizes

End Loop

Greedy Algorithm with No Rebalancing - or, Team Picking Algorithm (TPA)
Assign items to bins cyclically by bin sequence in descending order of item size

Two Examples

Example: Four Items
Here we see that the Greedy Algorithm finds the perfect solution, with no difference in bin size, but the two variants have a difference of two.Example: Six Items
Here we see that none of the algorithms finds the perfect solution. Both the standard Greedy Algorithm and its batched variant give a difference of two, while the variant without rebalancing gives a difference of four.

SQL Solutions

Original Model for GDY
See the link above for the SQL for the problem with three bins only.

The author has two measures for each bin and implements the GDY algorithm using CASE expressions and aggregation within the rules. The idea is to iterate over the items in descending order of size, setting the item bin to the bin with current smallest value. I use the word 'bin' for his 'bucket'. Some notes:

Dimension by row number, ordered by item value

Add measures for the iteration, it, and number of iterations required, counter

My Model solution works for any number of bins, passing the number of bins as a bind variable. The key idea here is to use values in the first N rows of a generic bin value measure to store all the running bin values, rather than as individual measures. I have included two modifications suggested by Stew in the AskTom thread.

Dimension by row number, ordered by item value

Initialise a bin measure to the row number (the first N items will remain fixed)

The idea here is that in the initial RSF query a subquery factor of items was joined on a calculated field, so the whole record set had to be read, and performance could be improved by putting that initial record set into an indexed temporary table ahead of the main query. We'll see in the performance testing section that this changes quadratic variation with problem size into linear variation.

The idea here is that procedural algorithms can often be implemented more efficiently in PL/SQL than in SQL.

The first parameter to the function is a strongly-typed reference cursor

The SQL call passes in a SELECT statement wrapped in the CURSOR keyword, so the function can be used for any set of records that returns name and numeric value pairs

The item records are fetched in batches of 100 using the LIMIT clause to improves efficiency

Performance Testing
I tested performance of the various queries using my own benchmarking framework across grids of data points, with two data sets to split the queries into two sets based on performance.

Query Modifications for Performance Testing

The RSF query with staging table was run within a pipelined function in order to easily include the insert in the timings

A system context was used to pass the bind variables as the framework runs the queries from PL/SQL, not from SQL*Plus

I found that calculating the bin values using analytic sums, as in the code above, affected performance, so I removed this for clarity of results, outputting only item name, value and bin

Test Data Sets
For a given depth parameter, d, random numbers were inserted within the range 0-d for d-1 records. The insert was:

Little variation of CPU time with number of bins for POSS and RSFT, but roughly linear variation for PLFN

These linear methods are much faster than the earlier quadratic ones for larger numbers of items

Its approximate proportionality of time to number of bins means that, while PLFN is faster than RSFT for small number of bins, it becomes slower from around 50 bins for our problem

The proportionality to number of bins for PLFN presumably arises from the step to find the bin of minimum value

The lack of proportionality to number of bins for RSFT may appear surprising since it performs a sort of the bins iteratively: However, while the work for this sort is likely to be proportional to the number of bins, the number of iterations is inversely proportional and thus cancels out the variation

Solution Quality

The methods reported above implement three underlying algorithms, none of which guarantees an optimal solution. In order to get an idea of how the quality compares, I created new versions of the second set of queries using analytic functions to output the difference between minimum and maximum bin values, with percentage of the maximum also output. I ran these on the same grid, and report below the results for the four corners.

The results indicate that GDY (Greedy Algorithm) and GBR (Greedy Algorithm with Batched Rebalancing) generally give very similar quality results, while TPA (Team Picking Algorithm) tends to be quite a lot worse.

Extended Problem: Finding the Number of Bins Required

An important extension to the problem is when the bins have fixed capacity, and it is desired to find the minimum number of bins, then spread the items evenly between them. As mentioned at the start, I posted extensions to two of my solutions on an OTN thread, and I reproduce them here. It turns out to be quite easy to make the extension. The remainder of this section is just lifted from my OTN post and refers to the table of the original poster.

Start OTN Extract
So how do we determine the number of bins? The total quantity divided by bin capacity, rounded up, gives a lower bound on the number of bins needed. The actual number required may be larger, but mostly it will be within a very small range from the lower bound, I believe (I suspect it will nearly always be the lower bound). A good practical solution, therefore, would be to compute the solutions for a base number, plus one or more increments, and this can be done with negligible extra work (although Model might be an exception, I haven't tried it). Then the bin totals can be computed, and the first solution that meets the constraints can be used. I took two bin sets here.

In my last article, A Simple SQL Solution for the Knapsack Problem (SKP-1), I presented an SQL solution for the well known knapsack problem in its simpler 1-knapsack form (and it is advisable to read the first article before this one). Here I present an SQL solution for the problem in its more difficult multiple-knapsack form. The solution is a modified version of one I posted on OTN, SQL Query for mapping a set of batches to a class rooms group, and I describe two versions of it, one in pure SQL, and another that includes a database function. The earlier article provided the solutions as comma-separated strings of item identifiers, and in this article also the solutions are first obtained as delimited strings. However, as there are now containers as well as items, we extend the SQL to provide solutions with item and container names in separate fields within records for each container-item pair. The solution is presented, as before, more for its theoretical interest than for practical applicability. Much research has been done on procedural algorithms for this important, but computationally difficult class of problems.

We will consider the same simple example problem as in the earlier article, having four items, but now with two containers with individual weight limits of 8 and 10. As noted in the earlier article, the problem can be considered as that of assigning each item to one of the containers, or to none, leading directly to the expression for the number of not necessarily feasible assignment sets for the example. We can again depict the 24 possible item combinations in a diagram, with the container limits added.

We can see that there is one optimal solution in this case, in which items 1 and 3 are assigned to container 1, while items 2 and 4 are assigned to container 2, with a profit of 100. How to find it using SQL?

SQL Solution
The solution to the single knapsack problem worked by joining items recursively in increasing order of item id, accumulating the total weights and profits, and terminating a sequence when no more items can be added within the weight limit. The item sequences were accumulated as comma-separated strings, and the optimal solutions obtained by analytic ranking of the profits.

For the multiple knapsack problem, it's not quite as simple, but a similar approach may be a good starting point. Previously our anchor branch in the recursion selected all items below the single maximum weight, but we now have containers with individual weights. If we now join the containers table we can find all items falling within the maximum weights by container. The recursion can then proceed to find all feasible item combinations by container. Here is the SQL for this:

Looking at this, we can see that the overall solution will comprise one feasible combination of items for each container, with the constraint that no item appears in more than one container. This suggests that we could perform a second recursion in a similar way to the first, but this time using the results of the first as input, and joining the feasible combinations of containers of higher id only. If we again accumulate the sequence in a delimited string, regular expression functionality could be used to avoid joining combinations with items already included. The following SQL does this recursion:

Notice the use of RegExp_Instr, which takes the current sequence with potential new combination appended as its source string, and looks for a match against the search string ',(\d+),.*?,\1,'. The function returns 0 if no match is found, meaning no duplicate item was found. The sequence includes the container id using a different delimiter, a colon, at the start of each combination. The search string can be explained as follows:

,(\d+), = a sequence of one or more digits with a comma either side, and the digit sequence saved for referencing
.*?,\1, = a sequence of any characters, followed by the saved digit sequence within commas. The ? specifies a non-greedy search, meaning stop searching as soon as a match is found

We can see that the optimal solutions can be obtained from the output again using analytic ranking by profit, and in this case the solution with a profit of 100 is the optimal one, with sequence ':1:,1,3,:2:,2,4,'. In the full solution, as well as selecting out the top-ranking solutions, we have extended the query to output the items and containers by name, in distinct fields with a record for every solution/container/item combination. For the example problem above, the output is:

SQL-Only Solution - XSQL
There are various techniques in SQL for splitting string columns into multiple rows and columns. We will take one of the more straightforward ones that uses the DUAL table with CONNECT BY to generate rows against which to anchor the string-parsing.

SQL with Function Solution - XFUN
The SQL techniques for string-splitting are quite cumbersome, and a better approach may be the use of a pipelined function that allows the string-parsing to be done in PL/SQL, a procedural language that is better suited to the task.

Query Structure Diagram (embedded directly)
The QSD shows both queries in a single diagram as the early query blocks are almost the same (the main difference is that the strings contain a bit more information for XSQL to facilitate the later splitting). The directly-embedded version shows the whole query, but it may be hard to read the detail, so it is followed by a larger, scrollable version within Excel.

The graphs tab below shows 3-d graphs of the number of rows processed and the CPU time for XFUN.

Notes

There is not much difference in performance between the two query versions, no doubt because the number of solution records is generally small compared with rows processed in the recursions

Notice that the timings correlate well with the rows processed, but not so well with the numbers of base records. The nature of the problem means that some of the randomised data sets turn out to be much harder to solve than others

Notice the estimated rows on step 36 of the execution plan for the pipelined function solution. The value of 8168 is a fixed value that Oracle assumes since it has no statistics to go on. We could improve this by using the (undocumented) cardinality hint to provide a smaller estimate

I extended my benchmarking framework for this article to report the intermediate numbers of rows processed, as well as the cardinality estimates and derived errors in these estimates (maximum for each plan). It is obvious from the nature of the problem that Oracle's Cost Based Optimiser (CBO) is not going to be able to make good cardinality estimates

Conclusions
Oracle's v11.2 implementation of the Ansii SQL feature recursive subquery factoring provides a means for solving the knapsack problem, in its multiple knapsack form, in SQL. The solution is not practical for large problems, for which procedural techniques that have been extensively researched should be considered. However, the techniques used may be of interest for combinatorial problems that are small enough to be handled in SQL, and for other types of problem in general.

A poster on OTN (Combination using pl/sql) recently asked for an SQL solution to a problem that turned out to be an example of the well known Knapsack Problem, for the case of a single knapsack. I posted an SQL query as a solution, and also a solution in PL/SQL because the SQL solution uses a feature only available in Oracle v11.2. In this article I explain how the solutions work and provide the results of a performance analysis that involved randomised test problems of varying computational difficulty. I have taken a more general form of problem than the original poster described, and the solutions here have been improved.

Knapsack Problem (1-Knapsack)
The various forms of knapsack problem have been studied extensively. The problems are known to be computationally difficult and many algorithms have been proposed for both exact and approximate solutions (see reference above). The SQL solution in this article is quite simple and will not be competitive in performance for larger problems in the form described here, but may be interesting for being implemented in pure SQL (and without using Oracle's Model clause, or a purely brute force approach). However, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems (see links at the top).

The problem can be stated informally, as follows: Given a set of items, each having positive weight and profit attributes, and a weight limit, find the combinations of items that maximise profit within the weight limit. Variant versions include the addition of multiple constraints (easy to handle), and inclusion of multiple knapsacks (more difficult). I also have a solution for the multiple knapsacks version described here (An SQL Solution for the Multiple Knapsack Problem (SKP-m)).

The difficulty of the problem arises from the number of possible combinations increasing exponentially with problem size. The number of these (not necessarily feasible) combinations, N(n,1), can be expressed in terms of the number of items, n, in two ways. First, we can use the well known binomial expression for the number of combinations of r items, summed from to :

N(n,1) =

where

Second, and more simply, we can observe that including an item in the combination, or not, is a binary choice, leading to:

N(n,1) =

This generalises easily to the expression for the multiple knapsack problem, with m knapsacks:

N(n,m) =

This can also be expressed using a binomial series as

N(n,m) =

Here, represents the number of combinations of r items from n, with being the number of assignments of the r items to m containers.

Let's look at a simple example problem having four items, with a weight limit of 9, as shown below:

There are 24 possible combinations of these items, having from 0 to 4 items. These are depicted below:

We can see that there are two optimal solutions in this case. How to find them using SQL?

SQL Solution

Oracle's v11.2 implementation of the Ansii standard Recursive Subquery Factoring can be used as the basis for an SQL solution. This would works as follows: Starting from each item in turn, add items recursively while remaining within the weight limit, and considering only items of id greater than the current id. The SQL looks like this, where a marker is added for leaf nodes, following an approach from the Amis technology blog:

The output contains 8 records, as opposed to the total of 23 non-null combinations, because only feasible items are joined, and permutations are avoided by the constraint that item ids increase along the path. Given positivity of weight and profit, we know that all solutions must be leaves, and we can represent the tree structure above in the following diagram:
We can now use the recursive subquery factor as an input to a main query that selects one of the most profitable solutions, or alternatively to a further subquery factor that ranks the solutions in order of descending profit. In the latter case, the main query can select all the most profitable solutions.

In the solution I posted on the OTN thread, I included a subquery factor to restrict the final query section to leaf nodes only. This was because we know that the solutions must be leaf nodes, and usually it is more efficient to filter out non-solution records as early as possible. However, I later realised that the work involved in the filtering might outweigh the saving for the final section, and this turned out to be the case here, as shown in the performance analysis section below. Here are the two queries, without the leaf node filtering:

Query Structure Diagram
It's worth noting that Oracle's proprietary recursive syntax, Connect By, cannot be used in this way because of the need to accumulate weights forward through the recursion. The new Ansii syntax is only available from v11.2 though, and I thought it might be interesting to implement a solution in PL/SQL that would work in earlier versions, following a similar algorithm, again with recursion.

PL/SQL Recursive Solution

This is a version in the form of a pipelined function, as I wanted to compare it with the SQL solutions, and be callable from SQL.SQL

SELECT COLUMN_VALUE sol
FROM TABLE (Packing_PLF.Best_Fits (100))
ORDER BY COLUMN_VALUE

It will be interesting to see how the solution methods perform as problem size varies, and we will use my own performance benchmarking framework to do this. As the framework is designed to compare performance of SQL queries, I have converted the PL/SQL solution to operate as a pipelined function, and thus be callable from SQL, as noted above. I included a version of the SQL solution, with the leaf filtering mentioned above, XKPLV - this was based on XKEEP, with filtering as in the OTN thread.

Test Data Sets

Test data sets are generated as follows, in terms of two integer parameters, w and d:

Insert w items with sequential ids, and random weights and profits in the ranges 0-d and 0-1000, respectively, via Oracle's function DBMS_Random.Value. The maximum weight is fixed at 100.

Test Results

The embedded Excel file below summarises the results obtained over a grid of data points, with w in (12, 14, 16, 18, 20) and d in (16, 18, 20).

Notes

The two versions of the non-leaf SQL solution take pretty much the same time to execute, and are faster than the others

The leaf version of the SQL solution (XKPLV) is slower than the non-leaf versions, and becomes much worse in terms of elapsed time for the more difficult problems; the step-change in performance can be seen to be due to its greater memory usage, which spills to disk above a certain level

The pipelined function solution is significantly slower than the other solutions in terms of CPU time, and elapsed time, except in the case of the leaf SQL solution when that solution's memory usage spills to disk. The pipelined function continues to use more memory as the problem difficulty rises, until all available memory is consumed, when it throws an error (but this case is not included in the result set above)

Conclusions

Oracle's v11.2 implementation of the Ansii SQL feature recursive subquery factoring provides a simple solution for the knapsack problem, that cannot be achieved with Oracle's older Connect By syntax alone.

The method has been described here in its exact form that is viable only for small problems; however, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems.

In my last article, Grouping by Unique Subsequences in Oracle, I compared three solutions to a querying problem in Oracle. I found that a solution using a pipelined function was fastest across a range of test data sets, while another using Oracle’s Model clause turned out to be extremely inefficient, and very unscaleable owing to quadratic variation in execution times.

I mentioned in the article the issue of possible poor cardinality estimates by Oracle’s Cost-Based Optimiser (CBO) for pipelined functions, referencing an article by Adrian Billington, setting cardinality for pipelined and table functions, that considers four techniques for improving these estimates.

In this article, I want to use another, very common, querying problem, to see in more detail how one of these techniques works, namely the dynamic sampling hint, and to compare the performance again of pipelined functions against the Model clause on a second example amenable to both methods.

In previous articles I have generally focussed on elapsed and CPU times to measure performance, but Oracle provides a range of metrics in instrumenting query execution. My benchmarking framework captures many of these, and we'll use them to try to understand the performance variation, again using a 2-dimensional domain of test data. We also capture the execution plans used across the domain and display visually the changes across the domain.

The problem is that of list aggregation, and various solution methods are available depending on one’s Oracle version. In Oracle v11.2 a specific built-in function has been provided, Listagg, and Adrian Billington has compared it with earlier SQL techniques (listagg function in 11g release 2), including a Model solution. He looks only at pure SQL solutions, but we will take both the Model and Listagg solutions, and add in a pipelined function solution. We’ll take a simple test problem using Oracle’s demo HR schema, which will be: Return, for each department, its id, manager name, and a comma-separated, ordered list of its employee names.

Test Data
The HR tables employees and departments were copied structurally to test versions with _t suffixes and records were inserted programmatically by the performance testing packages.

ERD

Listagg SolutionHow It WorksThe first solution for this problem uses the aggregation function ListAgg, which is new in Oracle v11.2. The query groups employees by department, joins departments to get the name, and employees again to get the manager’s name.

Pipelined Function SolutionHow It Works
This approach is based on pipelined database functions, which are specified to return array types. Pipelining means that Oracle transparently returns the records in batches while processing continues, thus avoiding memory problems, and returning initial rows more quickly. Within the function there is a simple cursor loop over the employees, joining departments to get the manager id. A string variable accumulates the list of employees, until the department changes, when the record is piped out, and the string reset to the new employee. The last record has to be piped after exiting the loop.

Types
Two database types are specified, the first being an object with fields for the department and manager ids and the employee name list; the second is an array of the nested table form with elements of the first type.

Function Pseudocode

Loop over a cursor selecting the records in order
If the department changes or first record then
If the department changes then
Pipe the row out
End if
Reset variables to current record values
Else
Append the current employee name to the name list
End if
End loop
If the last department is not null then
Pipe the row out using saved values
End if

SQL
Just select the fields named in the record from the function wrapped in the TABLE keyword, and join employees to get the manager name.

Performance Analysis
As in the previous article, I have benchmarked across a 2-dimensional domain, in this case width being the number of employees per department, and depth the number of departments. For simplicity, a single department, ‘Accounting’, and employee, ‘John Chen’, were used as templates and inserted repeatedly with suffixes on names and new ids.

The three queries above were run on all data points, and in addition the function query was run with a hint: DYNAMIC_SAMPLING (d 5).

Record Counts (total employees and employees per department)

Timings

In the embedded Excel file above, the four solutions are labelled as follows:

F = Pipelined Function solution

D = Pipelined Function with Dynamic Sampling hint solution

L = Listagg solution

M = Model solution

For the data points, font colour and fill colour signify:

Fill colours correspond to distinct execution plans whose hash values can be found later in the same tab. The formatted outputs can be found in the Plans tab for all distinct plans for selected data points (with hyperlinks)

White font signifies that the smallest elapsed time for the given data point occurred for the given solution, for all the tables except CPU time

For the CPU time table higher in the tab, white font signifies that the smallest CPU time for the given data point occurred for the given solution

Red font indicates that the solution incurred more than 500 disk reads (see the disk reads table later in the tab)

Graphs

Comparison
The CPU time for all four queries increases approximately in proportion with either width or depth dimension when the other is fixed, which is not surprising. The elapsed times are very similar to the CPU times for all except the larger problems using Model, which we’ll discuss in a later section. For the largest data point, the times rank in the following order:

The following points can be made:

The function query without the dynamic sampling hint was fastest for the largest data point, and by a significant margin over the next best, Listagg

The earlier detailed tables show that this was also true for the triangle of data points starting three points back in each dimension, indicating that this is the case once the problem size gets large enough

Similarly, the function query with dynamic sampling was in third place for all these larger problems

Model was always the slowest query, generally by a factor of about 8 over the fastest in terms of CPU time, but very much worse in elapsed time for problems above a certain size, and we’ll look at this in more detail next.

Model Performance Discontinuity
In Adrian Billington’s article mentioned above (listagg function in 11g release 2), he took a single large-ish data point for his problem and found that his Model query took 308 seconds compared with 6 seconds for Listagg. He puts the Model performance down to 'an enormous number of direct path reads/writes to/from the temporary tablespace'. In my last article, I also quoted the anonymous author of MODEL Performance Tuning: 'In some cases MODEL query performance can even be so poor that it renders the query unusable. One of the keys to writing efficient and scalable MODEL queries seems to be keeping session memory use to a minimum'.

My benchmarking framework records the metrics from the view v$sql_plan_stats_all, which is used by DBMS_XPlan.Display_Cursor to write the execution plan, and prints to a CSV file aggregates over the plan of several of them, for example: Max (last_disk_reads). These are are shown for the Model solution in the tab displayed below of the embedded Excel file (the next tab has 3-d graphs but they may not display in a browser).

The tables show that memory increases with the problem size in each direction up to a maximum, at which points the number of disk reads jumps from a very low level and then rises with problem size. The maximum memory points have red font. These transitions represent discontinuities where there is a jump in the elapsed times, although CPU times continue to rise smoothly. So while Model is always slower than the other solutions, its much greater use of memory causes the discrepancy to increase dramatically when the processing spills to disk, which is consistent with, and extends, the observations of the authors mentioned.

Model and Listagg Cardinality Estimates
Here is the execution plan for the last data point:

Notice that at line 6 the cardinality estimate is 260K while the actual rows returned was 1024: The CBO has simply ignored the filtering down to department level by row number, and assumed the number of employees! We may ask whether this mis-estimate has affected the subsequent plan. Well the result set at line 6 makes the second step in a hash join to another row set of actual cardinality 1024, so probably it has not made a significant difference in this case. It’s worth comparing the execution plan for Listagg:

Notice that the cardinality estimates are accurate apart from at line 1 for the Sort Group By, where a similar error has been made in not allowing for the reduction in rows caused by the grouping. Again it doesn’t seem to have affected the plan adversely.

Memory Usage and Buffers
A few observations can be made about these statistics:

Buffers is sometimes seen as a good, fundamental measure of performance, but we can see that both function solutions have figures of about 9K, while the other two have very similar figures of about 6K, and these do not correlate well with actual time performance here

The buffers figure for Model at the minimum data point is more than half that for the maximum, unlike the other solutions where it is 1-3%. The ratio of records is only 0.2%, so this is hard to understand

Similarly, the memory usage for Model starts very high, 3.1M, before rising to its maximum of 54M. For pipelined functions the figures go from 6K to 2.8M, and for Listagg from 29K to 15M

Dynamic Sampling Effects
Here is the execution plan for the pipelined function solution at point (W256-D512), with my own timing output from 'Timer Set...' on:

Notice that in the plan without dynamic sampling the cardinality estimate for the row set returned from the function is 8168, nearly 8 times the actual rows. When dynamic sampling is added the cardinality estimate is exactly right. Also the times reported in the plan are similar but show dynamic sampling to be faster. How can that be, when the table of results earlier showed the query with dynamic sampling taking nearly twice as long?

My framework breaks down the times for the steps in running the query and writing the results out. From these we see that the difference is largely accounted for by the function query taking only 0.01 seconds to open the cursor while with dynamic sampling it takes .55 seconds. Evidently the dynamic sampling hint is causing a call to the function at the query parsing stage which is not accounted for in the execution plan statistics. (Notice incidentally that Oracle is apparaently performing a deferred open in both cases, where the actual cursor opening is performed at the time of first fetching.)

If one looks at the colour-coded table of elapsed times above, it can be seen that dynamic sampling causes a single plan to be chosen for all data points with depth below 1024, while without the hint two plans are chosen that with dynamic sampling are chosen only at depth 1024. It can also be seen that the dynamic sampling version is faster overall in most cases, but at the highest depth is slower because the non-hinted plan is the same. The default cardinality estimate (8168) was too high until that point.

Conclusions
We have applied three techniques for list aggregation to one specific problem, and for that problem the following concluding remarks can be made:

The Model solution is much inferior in performance to both the new Listagg native function, and custom pipelined function solutions

The variation in performance has been analysed and for the model solution shown to become dramatically worse when problem size causes earlier in-memory processing to spill to disk

The dynamic sampling hint has been applied to the pipelined function solution and shown to give correct cardinality estimates. In our example, the performance effect was negative for the largest problem sizes owing to the overhead involved, but in other cases it was positive

We have oberved that both Model and Listagg solutions also have cardinality estimation problems, but have not analysed these further

The native Listagg solution is significantly, and surprisingly, slower than the custom pipelined function solution at the largest data points considered

We may say more generally:

Performance analysis across a 2-dimensional domain of data sets can provide more insight than just looking at one large zero-dimensional case

Microsoft Excel graphs and other functionality have proved very useful in helping to visualise what is happening in terms of performance

Our findings tend to bear out a common suspicion of Model clause on performance grounds, and further support the view that pipelined functions can be very performance-effective, used appropriately

Recently a deceptively simple question was asked on Oracle’s OTN PL/SQL forum that most posters initially misunderstood, myself included (Grouping Non Consecutive Rows). The question requested a solution for the problem of assigning group numbers to a sequence of records such that each group held no duplicate values of a particular field, and each group was as long as possible. One of the posters produced a solution involving the Model clause and I tried to find a solution without using that clause, just to see if it could be done. Having found such a solution, not without some difficulty, I then solved the problem via a pipelined PL/SQL function having felt that that would be more efficient than straight SQL.

I think all three solutions have their own interest, especially since forum posters frequently assert, incorrectly, that SQL solutions are always faster than those using PL/SQL. I’ll explain how each solution works in this article and will run them through my benchmarking framework. We’ll see that the PL/SQL solution time increases linearly with sequence length, while the others increase quadratically, and PL/SQL is orders of magnitude faster for even small problem sizes. I used Oracle Database v11.2 XE running on a Samsung X120 dedicated PC.

Functional Test Data
The problem data structure is quite simple, and I have defined my own version to emphasise generality, and to include partitioning, which was not part of the initial posting. There is one table with a two-column primary key, and I added an index on the partition field plus the value field.

Table Value_Subseqs

Column

Type

part_key*

Char(10)

ind*

Number

val

Char(10)

Index VSB_N1

part_key

val

Test Cases
Functional testing was done on a number of simple sequences, including the sequence of single characters, ABABB, which is used to illustrate the first solution query.

SQL Pairs Solution

How It Works
The first solution for this problem starts by obtaining the possible start and end points of the groups: all pairs that contain no duplicate. In the table below, the rows represent the possible pairs with X marking each value in the ranges. It’s easy to see the solution for this small test problem and the correct pairs are highlighted in yellow.

From the table it’s possible to work out a solution in SQL, given the pair set. The first correct pair is the longest starting from 1, while each subsequent correct pair is the longest beginning one element after the previous ends. This is of course a kind of tree-walk, which leads to the following solution steps (where everything is implicitly partitioned):

Within a subquery factor, self-join the table with the second record having a higher ind than the first

Add a NOT EXISTS subquery that checks for duplicate val in any pair between the outer potantial start and end point pair, using another self-join

Add in the single element pairs via a UNION

Within another subquery factor, obtain the rank of each pair in terms of its length descending, as well as the starting ind value (if it might not always be 1)

In another subquery factor perform the tree-walk as mentioned, using the condition ‘rank = 1’ to include only the longest ranges from any point

Include the the analytic function Row_Number in the Select list, which will be the group number

The main query selects from the last subquery factor, then joins the original table

Notes
The query is of course likely to be performance-intensive owing to the initial subquery factor with its nested self-joins across ranges.

Query DiagramNotes
The diagram notation follows and extends notation developed earlier, including my previous blog article, and is intended to be largely self-explanatory. In this diagram, I have added a new notation for joins that are not simple foreign key joins, in which I label the join and use a note to explain it.
Oracle v11.2 introduced Recursive Subquery Factors that extend tree-walk functionality. I used the older Connect By syntax in the query since it works on older versions, but found it easier to represent in the diagram as though it were the newer implementation – the diagram shows what’s happening logically.

How It Works
In the OTN thread several solutions were proposed that used Oracle’s Model clause or recursive subquery factoring, but I think only one was a general solution, since the others used string variables to successively concatenate strings over arbitrary numbers of rows and would break when the 4000 character SQL limit is hit.
The general Model solution (which was not by me, but I’ve reformatted it and applied it to my table) worked by defining two measures, for group start indices and group number. The rules specified two passes for the two measures: The first pass counts the distinct values and compares with the count of all values, between the previous group start and the current index; the second uses the group starts to set the group numbers.

Form the basic Select, with all the table columns required, and append a group number placeholder

Add the Model keyword, partitioning by part_key, dimensioning by analytic function Row_Number, ordering by ind within part_key, with val, group start and group number as measures

Initialise group start and group number to 1 in the measures clause

Define the first rule to obtain the group start date for all rows after the first as the previous group start, unless there is a difference between the two counts, in which case take the new index.

Define the second rule to obtain the group number for all rows as the previous group number, unless the group start has changed, in which case take the previous group number + 1.

Query DiagramNotes
Queries with the Model clause have a structure that is rather different from other queries, and the diagram attempts to reflect that structure for these problems. The main query feeds its output into an array processing component with a set of rules that specify how any additional data items (called measures) are to be calculated, in a mostly declarative fashion.
The model box above contains 4 specification types:

Partition - processing is to be performed separately by one or more columns; the same meaning as in analytic functions

Dimension - columns by which the array is dimensioned; can included analytic functions, as here

Measures - remaining columns that may be calculated or updated by the rules, possibly including placeholders from the main query

Rules - a set of rules that specify measure calculation; rules are processed sequentially, unless otherwise specified; in the diagram:

n - the current dimension value

F(n-1,n) - denotes that the value depends on values from previous and current rows (and so on, ‘..’ denotes a range)

^ - denotes that the calculation progresses in ascending order by dimension; this is the default so does not have to be coded

How It Works
This approach is based on pipelined database functions, which are specified to return array types. Pipelining means that Oracle transparently returns the records in batches while processing continues, thus avoiding memory problems, and returning initial rows more quickly. See Pipelined Functions (AskTom) for some other examples of use.
Within the function there is a simple cursor loop over the ordered sequence. A PL/SQL index-by array stores values for the current group, and allows duplicate checking to take place without any additional searching or sorting. The array is reset whenever the group changes.

Types
Two database types are specified, the first being an object with fields for the table columns and an extra field for the group to be derived; the second is an array of the nested table form with elements of the first type.

Function Pseudocode

Loop over a cursor selecting the records in order
If the partition field value changes then
Reset group and index-by array
Else if the current value is already in the index-by array then
Increment group number and reset index-by array
End if
Add the value to the index-by array
Pipe the row out
End loop

Pipelined Function Sim Solution
This solution was added after performance testing on the the first data set, below. It is a sort of hybrid between the Model and pipelined function approach created to try to understand the apparent quadratic variation of Model with sequence length noted in testing.

How It Works
This function uses the same database types as the Hash solution, but with the same counting idea as the Model solution within an old-fashioned nested cursor structure that one would not expect to perform efficiently.

Function Pseudocode

Loop over a cursor selecting the records in order
If the partition field value changes then
Reset group and group starting index
Else
Select counts of val and distinct val from the table between the group starting and current indices
If difference in counts then
Increment group number and reset group starting index
End if
End if
Pipe the row out
End loop

Performance Analysis
In SQL Pivot and Prune Queries - Keeping an Eye on Performance, in May 2011, I first applied an approach to performance testing of SQL queries whereby the queries are tested across a 2-dimensional domain, using a testing framework developed for that work. The same approach has been followed here, although the framework has been substantially improved and extended since then. In this case it may be of interest to observe performance across the following two dimensions:

Number of records per partition key

Average group size

However, as I prefer to randomise the test data, the group size is known only after querying the data set, so I will use a proxy dimension instead. The value field will be a string of 20 random capital letters (A-T) of length equal to the proxy dimension. As this length increases so will the average group size. Generally execution time would be expected to be proportional to number of partition keys when the other dimensions are fixed, and I will use 2 partition key values throughout.

Data Set 1
The first data set is chosen to keep the CPU times within reason for all queries, which limits the possible ranges (we’ll eliminate one query later and extend the ranges).

Output Record Counts

Depth/Width

W1

W2

W4

Records/Part>

100

200

400

D1

5

6

5

D2

25

24

24

D3

100

80

89

D4

100

200

267

CPU Times (Seconds)

Query

W1

W2

W4

W/Prior W Average

SQL Pairs

D1

2.11

8.49

33.99

4

D2

4.88

13.66

42.15

2.9

D3

10.3

46.71

255.64

5

D4

10.33

78.25

595.58

7.6

Model

D1

0.32

1.08

4.23

3.6

D2

0.31

1.14

4.23

3.7

D3

0.34

1.16

4.65

3.7

D4

0.36

1.36

4.86

3.7

Hash

D1

0.03

0.03

0.03

1

D2

0

0.01

0.03

2

D3

0.01

0.01

0.02

1.5

D4

0.02

0.01

0.02

1.3

Discussion

We can see immediately that on all data points there is the same performance ranking of the three queries and the differences are extreme. On W4-D4, SQL Pairs takes 123 times as long as Model, which in turn takes 243 times as long as Hash.

SQL Pairs
The Average Ratio figure in the table above is the average ratio between successive columns across the width dimension. On D1 this is 4, meaning that the CPU time has risen by the square of the width factor increase. On D8 it’s 7.6, being almost the cube of the factor. It appears that the performance varies with the square of the sequence length, except when the group size reaches the sequence length, when it becomes the cube. We would not consider this query for real problems when faster alternatives exist.

Model
The Average Ratio figure on all depths is about 3.7, meaning that the CPU time has risen by almost the square of the width factor increase. This is very surprising because, considering the algorithm one would assume to be effected by our query, if the group size remains constant then the work done in computing each group ought to be constant too, and the total work ought to rise by the same factor as the sequence length. We’ll look at this further in our second, larger data set, where we’ll also consider an apparently similar algorithm implemented in PL/SQL.

Hash
The Average Ratio figure varies between 1 and 2, but the CPU times are really too small for the figure to be considered reliable (note that the zero figure for W1-D2 was replaced by 0.005 in order to allow the log graph to include it). We can safely say, though, that this is faster by orders of magnitude even on very small problems, and will again look at a larger data set to see whether more can be said.

Data Set 2 (Second Query Set)
The second data set is chosen to give wider ranges, after excluding the Pairs query. A second function was added to replace it, labelled ‘Sim’ below and described above.

Output Record Counts

Depth/Width

W1

W2

W4

W8

W16

W32

Records/Part>

100

200

400

800

1600

3200

D1

5

5

5

5

5

5

D2

18

25

23

24

25

23

D3

50

80

89

84

119

110

D4

100

200

200

267

356

582

D5

100

200

400

533

1600

2133

D6

100

200

400

800

1600

3200

CPU Times (Seconds)

Query

W1

W2

W4

W8

W16

W32

W/Prior W Average

Hash

D1

0.02

0.01

0.01

0.03

0.06

0.10

1.6

D2

0.02

0.01

0.02

0.04

0.06

0.09

1.5

D3

0.02

0.02

0.02

0.04

0.06

0.09

1.4

D4

0.01

0.02

0.01

0.03

0.06

0.11

1.9

D5

0.01

0.03

0.02

0.05

0.06

0.09

1.8

D6

0.01

0.02

0.01

0.05

0.06

0.09

2.0

Model

D1

0.28

1.01

4.05

16.28

63.34

257.46

3.9

D2

0.27

1.06

4.00

15.97

66.27

249.99

3.9

D3

0.29

1.14

4.24

16.28

62.91

250.50

3.9

D4

0.33

1.24

4.74

17.66

69.00

263.34

3.8

D5

0.33

1.26

5.10

19.17

81.59

314.39

3.9

D6

0.33

1.28

4.99

20.55

80.80

331.38

4.0

Sim

D1

0.25

0.37

0.74

1.70

3.14

6.18

1.9

D2

0.28

0.59

1.21

2.42

4.73

9.28

2.0

D3

0.33

0.67

1.43

2.79

5.86

11.28

2.0

D4

0.34

0.77

1.58

3.15

6.97

14.71

2.1

D5

0.36

0.73

1.69

3.61

9.64

22.58

2.3

D6

0.36

0.73

1.71

3.79

9.43

26.45

2.4

Discussion

We can see immediately that on all data points there is the same performance ranking of the three queries and the differences are extreme. On W32-D6, Model takes 12 times as long as Sim, which in turn takes 294 times as long as Hash, Model being 3,678 slower than Hash. (I have included the elapsed times, which are very close to the CPU times, in the shared XL file above. I think the table data are being read from buffer cache throughout).

Model
The Average Ratio figure on all depths is about 3.9, meaning that the CPU time has risen by almost the square of the width factor increase. It seems that the internal algorithm applied from the Model query here is doing something different from what we would expect, and with dire consequences for performance. For what it's worth, here is the last Execution Plan:

Hash
The Average Ratio figure varies between 1.4 and 2, but the CPU times are probaby still too small for the figure to be considered reliable. We can again say, though, that this is faster than the Model solution by orders of magnitude even on very small problems and that the CPU time does not appear to be rising faster than linearly, so that the performance advantage will increase with problem size.

Sim
The Average Ratio figure varies between 1.9 and 2.4, and up to D3 is not more than 2, which indicates a pretty exact proportionality between sequence length and CPU time. This is consistent with our expectation of linearity so long as the group sizes are smaller than sequence lengths. For D6, where the group sizes are the same as the sequence lengths, we would expect to see a quadratic term (number of counts doubles, and work done in sorting/counting also doubles, if that is linear), and time in fact trebles between W16 and W32.
The results from this solution support the view that there is some internal implementation problem with Model for this example that is causing its quadratic CPU time variation. In my August 2011 revision of Forming Range-Based Break Groups with Advanced SQL, I noted significant under-performance in a query using analytic functions, that also seemed to due to an internal implementation problem, and found a work-around that made the query perform as expected. I believe both examples illustrate well the power of this kind of dimensional performance analysis.

Model Clause vs Pipelined Functions
I googled model performance problems and the two top-ranked articles are briefly discussed in the first two subsections below.MODEL Performance Tuning
The author states: ‘In some cases MODEL query performance can even be so poor that it renders the query unusable. One of the keys to writing efficient and scalable MODEL queries seems to be keeping session memory use to a minimum’. Our first case would seem to fall into one of those cases, Model being 3,678 times slower than the function, at 332 seconds, on a problem of only 3,200 records (times two partition keys) and rising quadratically. The article considers the effects of changing various features of a test Model problem on the memory usage, assuming that that is one cause of poor performance.From Pipelined Function to Model
This article is interesting in that it takes pretty much the opposite line to my own. For a different problem, the author started with a pipelined function solution, and then went to Model on performance grounds. He says: ‘Since we were returning the rows in a pipelined (streaming) fashion, the performance was fine initially. It was when the function was called constantly and then joined with other tables that we ran into trouble’. The problem he identifies seems to be principally the fact that Oracle’s Cost Based Optimiser (CBO) cannot accurately predict the cardinality of the function, and assigns a default, on his system (as on mine) of 8168. This can cause poor execution plans when joined with other tables. His initial solution was to use the CARDINALITY hint, which worked but is undocumented and inflexible. He also notes possible performance issues caused by Oracle’s translating the PL/SQL table into an SQL result set and goes on to propose a Model solution to avoid this problem. Unfortunately, the author does not provide any results on significant data sets to demonstrate the performance differences. The following article looks specifically at the cardinality issue.setting cardinality for pipelined and table functions
The author (Adrian Billington) considers four techniques that he labels thus:

CARDINALITY hint (9i+) undocumented

OPT_ESTIMATE hint (10g+) undocumented

DYNAMIC_SAMPLING hint (11.1.0.7+)

Extensible Optimiser (10g+)

The first two are not recommended on the grounds of being undocumented. The third option appears quite a lot simpler than the fourth and I will look at that approach in a second example problem, in my next article List Aggregation in Oracle - Comparing Three Methods.

Discussion
The pipelined function approach is plainly much faster than the SQL solutions in this example, but one has to be aware of possible issues such as the cardinality issue mentioned. One also needs to be aware that pure SQL statements are ‘read-consistent’ in Oracle, but this is not the case when functions are called that themselves do SQL.
Context switches between the SQL and PL/SQL engines, as well as the work done in translating between collections and SQL record sets, are often cited as performance reasons for preferring SQL-only solutions. As we have seen though, these issues, while real, can be dwarfed by algorithmic differences.

Conclusions

For the subsequence grouping problem addressed, using a pipelined function is faster by far than the SQL-only solutions identified

Although widely asserted, the notion that any query processing will be executed more efficiently in pure SQL than in PL/SQL is a myth

The Model solution using embedded aggregate Counts is much slower than expected and its quadratic CPU variation suggests performance problems within Oracle’s internal implementation of the Model clause

Dimensional performance analysis is very powerful although its use appears to be extremely rare in the Oracle community

It is suggested that diagrammatic techniques, such as my Query Structure Diagramming, although also very rarely used, offer important advantages for query documentation and design