Tag Archives: Recursive

Post navigation

I noticed a question on AskTom last November concerning SQL for splitting delimited strings, Extract domain names from a column having multiple email addresses, a kind of question that arises frequently on the forums. There was some debate between reviewers Rajeshwaran Jeyabal, Stew Ashton and the AskTom guys on whether an XML-based solution performs better or worse than a more 'classic' solution based on the Substr and Instr functions and collections. AskTom's Chris Saxon noted:

For me this just highlights the importance of testing in your own environment with your own data. Just because someone produced a benchmark showing X is faster, doesn't mean it will be for you.

For me, relative performance is indeed frequently dependent on the size and 'shape' of the data used for testing. As I have my own 'dimensional' benchmarking framework, A Framework for Dimensional Benchmarking of SQL Performance, I was able to very quickly adapt Rajesh's test data to benchmark across numbers of records and numbers of delimiters, and I put the results on the thread. I then decided to take the time to expand the scope to include other solutions, and to use more general data sets, where the token lengths vary as well as the number of tokens per record.

In fact the scope expanded quite a bit, as I found more and more ways to solve the problem, and I have only now found the time to write it up. Here is a list of all the queries considered:

Queries using Connect By for row generation

MUL_QRY - Cast/Multiset to correlate Connect By

LAT_QRY - v12 Lateral correlated Connect By

UNH_QRY - Uncorrelated Connect By unhinted

RGN_QRY - Uncorrelated Connect By with leading hint

GUI_QRY - Connect By in main query using sys_guid trick

RGX_QRY - Regular expression function, Regexp_Substr

Queries not using Connect By for row generation

XML_QRY - XMLTABLE

MOD_QRY - Model clause

PLF_QRY - database pipelined function

WFN_QRY - 'WITH' PL/SQL function directly in query

RSF_QRY - Recursive Subquery Factor

RMR_QRY - Match_Recognize

Test Problem

DELIMITED_LISTS Table

CREATE TABLE delimited_lists(id INT, list_col VARCHAR2(4000))
/

Functional Test Data

The test data consist of pipe-delimited tokens ('|') in a VARCHAR2(4000) column in a table with a separate integer unique identifier. For functional testing we will add a single 'normal' record with two tokens, plus four more records designed to validate null-token edge cases as follows:

All queries returned the expected results above, except that the XML query returned 12 rows with only a single null token returned for record 5. In the performance testing, no null tokens were included, and all queries returned the same results.

Performance Test Data

Each test set consisted of 3,000 records with the list_col column containing the delimited string dependent on width (w) and depth (d) parameters, as follows:

Each record contains w tokens

Each token contains d characters from the sequence 1234567890 repeated as necessary

The output from the test queries therefore consists of 3,000*w records with a unique identifier and a token of length d. For performance testing purposes the benchmarking framework writes the results to a file in csv format, while counting only the query steps in the query timing results.

In Oracle upto version 11.2 VARCHAR2 expressions cannot be longer than 4,000 characters, so I decided to run the framework for four sets of parameters, as follows:

Depth fixed, high; width range low: d=18, w=(50,100,150,200)

Depth fixed, low; width range high: d=1, w=(450,900,1350,1800)

Width fixed, low; depth range high: w=5, d=(195,390,585,780)

Width fixed, high; depth range low: w=150, d=(6,12,18,24)

All the queries showed strong time correlation with width, while a few also showed strong correlation with depth.

Queries

All execution plans are from the data point with Width=1800, Depth=1, which has the largest number of tokens per record.

This is the 'classic' CONNECT BY solution referred to above, which appears frequently on AskTom and elsewhere, and I copied the version used by Jayesh. The somewhat convoluted casting between subquery and array and back to SQL record via multiset allows the prior table in the from list to be referenced within the inline view, which is otherwise not permitted in versions earlier than 12.1, where the LATERAL keyword was introduced.

Despite this query being regarded as the 'classic' CONNECT BY solution to string-splitting, we will find that it is inferior in performance to a query I wrote myself across all data points considered. The new query is also simpler, but is not the most efficient of all methods, as we see later.

This query is taken from Splitting Strings: Proof!, and uses a v12.1 new feature, described with examples in LATERAL Inline Views. The new feature allows you to correlate an inline view directly without the convoluted Multiset code, and can also be used with the keywords CROSS APPLY instead of LATERAL. It's sometimes regarded as having peformance advantages, but in this context we will see that avoiding this correlation altogether is best for performance.

I wrote the UNH_QRY query in an attempt to avoid the convoluted Multiset approach of the 'classic' solution. The reason for the use of arrays and Multiset seems to be that, while we need to 'generate' multiple rows for each source row, the number of rows generated has to vary by source record and so the row-generating inline view computes the number of tokens for each record in its where clause.

The use of row-generating subqueries is quite common, but in other cases one often has a fixed number of rows to generate, as in data densification scenarios for example. It occurred to me that, although we don't know the number to generate, we do have an upper bound, dependent on the maximum number of characters, and we could generate that many in a subquery, then join only as many as are required to the source record.

This approach resulted in a simpler and more straightforward query, but it turned out in its initial form to be very slow. The execution plan above shows that the row generator is driving a nested loops join within which a full scan is performed on the table. The CBO is not designed to optimise this type of algorithmic query, so I added a leading hint to reverse the join order, and this resulted in much better performance. In fact, as we see later the hinted query outperforms the other CONNECT BY queries, including the v12.1 LAT_QRY query at all data points considered.

This query also generates rows using CONNECT BY, but differs from the others shown by integrating the row-generation code with the main rowset and avoiding a separate subquery against DUAL. This seems to be a more recent approach than the traditional Multiset solution. It uses a trick involving the system function sys_guid() to avoid the 'connect by cycle' error that you would otherwise get, as explained in this OTN thread: Reg : sys_guid()
.

Unfortunately, and despite its current popularity on OTN, it turns out to be even less efficient than the earlier approaches, by quite a margin.

This is a fairly well known approach to the problem that involves doing the string splitting within a pipelined database function that is passed the delimited string as a parameter. I wrote my own version for this article, taking care to make only one call to each of the oracle functions Instr and Substr within a loop over the tokens.

The results confirm that it is in fact the fastest approach over all data points considered, and CPU time increased approximately linearly with number of tokens.

Oracle introduced the ability to include a PL/SQL function definition directly in a query in version 12.1. I converted my pipelined function into a function within a query, returning an array of character strings.

As we would expect from the results of the similar pipelined function approach, this also turns out to be a very efficient solution. However, it may be surprising to many that it is significantly slower (20-30%) than using the separate database function, given the prominence that is usually assigned to context-switching.

Oracle introduced Match_Recognize in v12.1, as a mechanism for pattern matching along the lines of regular expressions for strings, but for matching patterns across records. I wrote the query for this article, converting each character in the input strings into a separate record to allow for its use.

This approach might seems somewhat convoluted, and one might expect it to be correspondingly slow. As it turns out though, for most datasets it is faster than many of the other methods, the ones with very long tokens being the exception, and CPU time increased linearly with both number of tokens and number of characters per token. It is notable that, apart from the exception mentioned, it outperformed the regular expression query.

The CPU times are listed but elapsed times are much the same. Each table has columns in order of increasing last CPU time by query.

Depth fixed, high; width range low: d=18, w=(50,100,150,200)

Depth fixed, low; width range high: d=1, w=(450,900,1350,1800)

Width fixed, low; depth range high: w=5, d=(195,390,585,780)

Width fixed, high; depth range low: w=150, d=(6,12,18,24)

A Note on the Row Generation by Connect By Results

It is interesting to observe that the 'classical' mechanism for row-generation in string-splitting and similar scenarios turns out to be much slower than a simpler approach that removes the correlation of the row-generating subquery. This 'classical' mechanism has been proposed on Oracle forums over many years, while a simpler and faster approach seems to have gone undiscovered. The reason for its performance deficit is simply that running a Connect By query for every master row is unsurprisingly inefficient. The Use of the v12.1 LATERAL correlation syntax simplifies but doesn't improve performance by much.

The more recent approach to Connect By row-generation is to use the sys_guid 'trick' to embed the Connect By in the main query rather than in a correlated subquery, and this has become very popular on forums such as OTN. As we have seen, although simpler, this is even worse for performance: Turning the whole query into a tree-walk isn't good for performance either. It's better to isolate the tree-walk, execute it once, and then just join its result set as in RGN_QRY.

Conclusions

The database pipelined function solution (PLF_QRY) is generally the fastest across all data points

Using the v12.1 feature of a PL/SQL function embedded within the query is almost always next best, although slower by up to about a third; its being slower than a database function may surprise some

Times generally increased uniformly with numbers of tokens, usually either linearly or quadratically

Times did not seem to increase so uniformly with token size, except for XML (XML_QRY), Match_Recognize (RMR_QRY) and regular expression (RGX_QRY)

For larger numbers of tokens, three methods all showed quadratic variation and were very inefficient: Model (MOD_QRY), regular expression (RGX_QRY), and recursive subquery factors (RSF_QRY)

We have highlighted two inefficient but widespread approaches to row-generation by Connect By SQL, and pointed out a better method

These conclusions are based on the string-splitting problem considered, but no doubt would apply to other scenarios involving requirements to split rows into multiple rows based on some form of string-parsing.

Networks or hierarchies of arbitrary depth are difficult to traverse in SQL without using recursion. However, there also exist hierarchies of fixed and fairly small depths, and these can be traversed either recursively or by a sequence of joins for each of the levels. In this article I compare the performance characteristics of three traversal methods, two recursive and one non-recursive, using my own benchmarking package (A Framework for Dimensional Benchmarking of SQL Performance), on a test problem of a fixed level organization structure hierarchy, with 5 levels for performance testing and 3 levels for functional testing.

The three queries tested were:

JNS_QRY: Sequence of joins

PLF_QRY: Recursive pipelined function

RSF_QRY: Recursive subquery factors

Fixed Level Hierarchy Problem Definition

A hierarchy is assumed in which there are a number of root records, and at each level a parent can have multiple child records and a child can also have multiple parents. Each level in the hierarchy corresponds to an entity of a particular type. Each parent-child record is associated with a numerical factor, and the products of these propagate down the levels.

The problem considered is to report all root/leaf combinations with their associated products. There may of course be multiple paths between any root and leaf, and in a real world example one would likely want to aggregate. However, in order to keep it simple and focus on the traversal performance, I do not perform any aggregation.

To simplify functional validation a 3-level hierarchy was taken, with a relatively small number of records. The functional test data were generated by the same automated approach used for performance testing. The fact number was obtained as a random number betwee 0 and 1, and to keep it simple, duplicate pairs were permitted.

The test data were parametrised by width and depth as follows (the exact logic is a little complicated, but can be seen in the code itself):

Width corresponds to a percentage increase in the number of child entities relative to the number of parents

Depth corresponds to the proportion of the parent entity records a child is (randomly) assigned. Each child has a minimum of 1 parent (lowest depth), and a maximum of all parent entities (highest depth)

It is interesting to note that all joins in the execution plan are hash joins, and in the sequence you would expect. The first three are in the default join 'sub-order' that defines whether the joined table or the prior rowset (the default) is used to form the hash table, while the last two are in the reverse order, corresponding to the swap_join_inputs hint. I wrote a short note on that subject, A Note on Oracle Join Orders and Hints, last year, and have now written an article using the largest data point in the current problem to explore performance variation across the possible sub-orders.

For simplicity a stand-alone database function was used here. The query execution plan was obtained by the benchmarking framework and the highest data point plan listed. The query within the function was extracted and an explain Plan performed manually, which showed the expected index range scan.

JNS_QRY is faster than RSF_QRY, which is faster than PLF_QRY at all data points

PLF_QRY tracks the number of output records very closely. This is likely because the function executes a query at every node in the hierarchy that uses an indexed search.

The pure SQL methods scale better through being able to do full table scans, and avoiding multiple query executions

Deep Slice Elapsed - CPU Times

The elapsed time minus the CPU times are shown in the first graph below, followed by the disk writes. The disk writes (and reads) are computed as the maximum values across the explain plan at the given data point, and are obtained from the system view v$sql_plan_statistics_all. The benchmarking framework gathers these and other statistics automatically.

The graphs show how the elapsed time minus CPU times track the disk accesses reasonably well

RSF_QRY does nearly twice as much disk writes as the other two

Wide Slice Results [w=180, d=(100, 120, 140, 160, 180)]

The performance characteristics of the three methods across the wide slice data points are pretty similar to those across the deep slice. The graphs are shown below.

Conclusions

For the example problem taken, the most efficient way to traverse fixed-level hierarchies is by a sequence of joins

Recursive methods are significantly worse, and the recursive function is especially inefficient because it performs large numbers of query executions using indexed searches, instead of full scans

I recently posted an article on Dimensional Benchmarking of Oracle v10-v12 Queries for SQL Bursting Problems. This article added an Oracle v12 SQL solution, involving Match_Recognize to benchmark against some v10 and v11 solutions that I had posted on Scribd a few years ago. A few days before posting it I noticed an OTN thread with a problem that struck me as being of a similar type, Amalgamating groups to be beyond a given size threshold. Where in my original 'bursting' problem a group is defined by a maximum interval from its starting date, in the OTN problem a group is defined by the cumulative sum of a numeric attribute from the group starting record.

I added a comment on the thread at the time mentioning the results that I had got on the original problem, and adding a model solution for the problem raised on the new thread. I have now taken this second 'bursting'-type problem and have benchmarked both the main two solutions proposed on that thread (by other posters), as well as two versions of my own model solution, and a variant of the recursive subquery factor solution that uses a temporary table to achieve much faster performance.

Also, I noticed a question just yesterday on AskTom that is posing essentially the same problem as in my earlier article (which itself came from AskTom several years ago 🙂 ), Complex sql.

The results show that Match_Recognize, as before, is by far the most efficient solution. They also show that the faster solutions vary linearly with dataset size (within a given partition), while the slow ones vary quadratically. One interesting finding is that the solution by the Model clause can be changed from very slow, and quadratically varying, to linearly varying, and second in performance only to Match_Recognize, by using a rule ordering clause (which avoids the need for automatic rules ordering).

The problem is to determine break groups using a running aggregate based on some function of the record attributes, with a defined ordering, starting from the group starting record, and with a group's end record defined by the aggregate reaching (or exceeding) some limit. One may consider the first record reaching (or exceeding) the limit to define the first record in the next group, as in the original bursting problem, or to be the last record in the current group, as in the OTN example.

The data are partitioned by some key in general.

OTN-like Item Weights 'Bursting' Problem

The data structure used in this article is based on that of the original poster in the OTN thread, but with more generic table and column names.

I created test data with a test weight limit of 10, as follows, with groups shown at detailed level. The first two categories are taken from the OTN problem, while I added a third category to test the case where the limit is not reached.

In the Match_Recognize query proposed in the OTN thread the pattern is defined in terms of two categories, say s and t, where:

s denotes a record where the running sum < the limit

t denotes a record where the running sum >= the limit

The pattern to match can be written as (s* t?) meaning zero or more category s records, followed by zero or one category t records. This immediately suggests that any given match falls into one of the following scenarios for frequencies of (s, t):

(0, 0) - this looks like an empty set of records, but could be non-empty if null values were allowed for the weight

(1+, 0) - the case where the limit is not reached, which must be the last match if there are no null weights

(0, 1) - where the first record in a group reaches the limit by itself

(1+, 1) - where one or more records in a group are below the limit, followed by a record that reaches the limit

In the results above, we see that group 22 matches scenario 2, while groups 10, 16 and 17 match scenario 3, and the remainder match scenario 4. We take the weight to be not null so scenario 1 is not possible. This kind of 'scenario coverage' is much more important than the 'code coverage' that is often focussed on in testing, especially by object oriented programmers.

In the following sections for individual queries, the query (and other SQL) is listed first, followed by the execution plan for the largest problem (W40-D8000).

In some cases, Oracle Database may not be able to ascertain that your model is acyclic even though there is no cyclical dependency among the rules. This can happen if you have complex expressions in your cell references. Oracle Database assumes that the rules are cyclic and employs a CYCLIC algorithm that evaluates the model iteratively based on the rules and data. Iteration stops as soon as convergence is reached and the results are returned. Convergence is defined as the state in which further executions of the model will not change values of any of the cell in the model. Convergence is certain to be reached when there are no cyclical dependencies.

When we specify automatic order, the solution is obtained without error using Oracle's cyclic algorithm (operation SQL MODEL CYCLIC). Unfortunately, in this case there is a large performance impact, and we will see in the results section that execution time varies as the square of the number of records within a partition, i.e. quadratically.

In the query above, the rules order clause is omitted, thus defaulting to sequential, while avoiding the ORA-32637 error. This is achieved by specifying ORDER BY rn DESC on the left side of the second rule. The solution, via operation SQL MODEL ORDERED is much faster, and we will see in the results section that execution time now varies linearly with the number of records within a partition.

The query above is essentially the same as one of the posters proposed on the OTN thread, with a slight tweak to the pattern that does not alter its meaning, and also changing it to return one row per match. The query performs much more efficiently than any of the other queries, using the Match_Recognize clause introduced in Oracle 12.1 SQL for Pattern Matching.

This is based on the second of the recursive subquery factor queries in the OTN thread, and we can see the performance issue in the plan above. The recursive branch of the UNION ALL executes once for each record within a partition and performs a full scan on the items table each time. This results in execution time varying as the square of the number of records within a partition, as can be seen in the results section later. The performance can be much improved by using a temporary table, as in the next query.

In this solution, the initial subquery from the previous query is written to a temporary table that is indexed on the join column. This means that the join in the recursive branch of the UNION ALL is indexed and much quicker, resulting in linear variation in execution time with the number of records in a partition.

Notice that it was necessary to hint the index usage. It is possible to achieve the indexed join without a hint by including a call to gather statistics in the pre-query SQL. Unfortunately, Oracle's DBMS_Stats procedure performs a commit - which clears the data from the temporary table. Although we could get around the clearing of the table by making it a normal table and manually truncating it, it is probably better to accept this as a valid use-case for a hint - after all, the whole purpose of the temporary table is to permit index use.

Performance Testing Results

The 'width' parameter is taken to be the number of cat values partitioning the dataset, while the 'depth' parameter is taken to be the number of records within each category. The weight is assigned a random integer between 1 and 100, and the weight limit is 5,000.

Record Counts Table

Input Record Counts

Depth

W10

W20

W40

D1000

10,000

20,000

40,000

D2000

20,000

40,000

80,000

D4000

40,000

80,000

160,000

D8000

80,000

160,000

320,000

Output Record Counts

Depth

W10

W20

W40

D1000

105

209

422

D2000

205

411

829

D4000

406

815

1,625

D8000

808

1,618

3,229

Elapsed Times Table (elapsed seconds)

MOD_QRY

Elapsed Seconds

Depth Ratios to Prior

Width Ratios to Prior

Depth

W10

W20

W40

W10

W20

W40

W20

W40

D1000

16

47

99

2.9

2.1

D2000

62

190

397

3.9

4.0

4.0

3.1

2.1

D4000

243

762

1,390

3.9

4.0

3.5

3.1

1.8

D8000

962

2,082

5,566

4.0

2.7

4.0

2.2

2.7

Average

3.9

3.6

3.8

2.8

2.2

MOD_QRY_D

Elapsed Seconds

Depth Ratios to Prior

Width Ratios to Prior

Depth

W10

W20

W40

W10

W20

W40

W20

W40

D1000

0.08

0.16

0.31

2.0

1.9

D2000

0.16

0.30

0.61

2.0

1.9

2.0

1.9

2.0

D4000

0.30

0.59

1.19

1.9

2.0

2.0

2.0

2.0

D8000

0.59

1.20

2.42

2.0

2.0

2.0

2.0

2.0

Average

1.9

2.0

2.0

2.0

2.0

MTH_QRY

Elapsed Seconds

Depth Ratios to Prior

Width Ratios to Prior

Depth

W10

W20

W40

W10

W20

W40

W20

W40

D1000

0

0.016

0.016

#DIV/0!

1.0

D2000

0.016

0.016

0.047

#DIV/0!

1.0

2.9

1.0

2.9

D4000

0.016

0.031

0.078

1.0

1.9

1.7

1.9

2.5

D8000

0.047

0.094

0.172

2.9

3.0

2.2

2.0

1.8

Average

2.0

2.0

2.3

1.6

2.1

RSF_QRY

Elapsed Seconds

Depth Ratios to Prior

Width Ratios to Prior

Depth

W10

W20

W40

W10

W20

W40

W20

W40

D1000

6

12

24

2.0

2.0

D2000

23

46

94

3.8

3.8

3.9

2.0

2.0

D4000

92

185

377

4.0

4.0

4.0

2.0

2.0

D8000

369

750

1,513

4.0

4.1

4.0

2.0

2.0

Average

3.9

4.0

4.0

2.0

2.0

RSF_TMP

Elapsed Seconds

Depth Ratios to Prior

Width Ratios to Prior

Depth

W10

W20

W40

W10

W20

W40

W20

W40

D1000

0.09

0.19

0.36

2.1

1.9

D2000

0.19

0.38

0.73

2.1

2.0

2.0

2.0

1.9

D4000

0.39

0.77

1.53

2.1

2.0

2.1

2.0

2.0

D8000

0.77

1.55

3.49

2.0

2.0

2.3

2.0

2.3

Average

2.0

2.0

2.1

2.0

2.0

Slice Graphs

Performance Discussion

Variation with Width

The width parameter represents the number of categories here, and category (CAT) is the query partitioning key. We might therefore expect that the execution time would be proportional to the width when the depth parameter is fixed. The width values used were 10, 20 and 40, so we would expect times to double between W10 and W20, and again between W20 and W40.

In fact, we see from the width ratios columns in the tables that this expectation is very closely matched in the cases of MOD_QRY_D, RSF_QRY, AND RSF_TMP.

For MOD_QRY, the ratios are quite variable, and mostly above 2, so that the CYCLIC Model algorithm does not meet our expectation.

For MTH_QRY (Match_Recogize), the elapsed times are very small, 0.17 for the largest problem (14 times faster than the next best, MOD_QRY_D), and that likely explains the variance.

Variation with Depth

The depth parameter represents the number of of records for each category. The depth ratios show that two of the queries show very close to quadratic variation of time with depth, while three show very close to linear variation, and the linear queries are unsurprisingly much faster.

MOD_QRY and RSF_QRY vary quadratically with depth (number of records per partition key).

As in the earlier article, the new v12.1 feature Match_Recogize proved to be much faster than the other techniques for this problem

The solution using Model clause with the operation SQL MODEL CYCLIC showed quadratic variation in execution times with size, but a very simple change to allow SQL MODEL ORDERED operation produced linear variation, and was second only to Match_Recogize in performance

Recursive subquery factoring had timings that increased quadratically with number of records; this was due to a combination of the number of starts of a subquery, and full scans within it

What we call the beginning is often the end
And to make an end is to make a beginning.
The end is where we start from. And every phrase
And sentence that is right (where every word is at home,
Taking its place to support the others,
The word neither diffident nor ostentatious,
An easy commerce of the old and the new,
The common word exact without vulgarity,
The formal word precise but not pedantic,
The complete consort dancing together)
Every phrase and every sentence is an end and a beginning,
Every poem an epitaph

- from Little Gidding by T.S. Eliot

A few years ago I wrote some SQL queries to assign dated records into groups defined by each record being within a fixed window of its starting record (the original Scribd document I wrote is embedded at the bottom). This is a bit harder than it sounds in pure SQL, without using PL/SQL, and I could only do it using new features from versions 10 and 11 of Oracle. With companies increasingly migrating to version 12, I thought it might be interesting to compare these queries with a query using the new 12c feature MATCH_RECOGNIZE. It turns out that the 12c query is both simpler and faster than the earlier queries. I'll describe the problem with a simple functional test data set first, then will give the SQL for each of four methods with the execution plan for a larger data set. At the end I summarise the results from the four methods across a range of problem sizes.

The problem is to determine the break groups using distance from the group start point. In other words, once a group starts, all records that start within a fixed distance from the group start are in the group, and the first record after the end of a group defines the next group start. The data are partitioned by some key in general (here person_id). The problem data structure is based on a question posed in Tom Kyte’s Oracle forum, Activities and breaks, while the test data are my own.

500w records are generated for each of three persons, where w is a 'width' parameter, with start dates randomized across a century, and a depth parameter is passed to the query for the number of days group limit via a system context.

The new 12c feature MATCH_RECOGNIZE is a very powerful technique, and was much faster than the other techniques for this problem

The results above showed that recursive subquery factoring had timings that increased quadratically with number of records; this was due to a product between the number of starts and full scans on a subquery

This kind of unscaleable quadratic resource usage can often be avoided by the use of a temporary table with appropriate indexes, as demonstrated

The depth parameter had little effect on timing, but I included it for the purpose of demonstration of the benchmarking framework