Main menu

Post navigation

Dimensional Benchmarking of Bracket Parsing SQL

I noticed an interesting thread on OTN recently, Matching ( in a string. It's about using SQL to find matching bracket pairs (technically 'parentheses' but 'brackets' is shorter, and it makes no difference to the SQL). Incidentally, I found recently that nested bracket expressions are a nice way of compactly representing the structure of complex hash join execution plans, A Note on Oracle Join Orders and Hints.

I thought it would be interesting to run some of the solution queries through my benchmarking package to test performance (A Framework for Dimensional Benchmarking of SQL Performance). I decided to consider only the queries that addressed multiple records, and the form of the problem that requires returning record uid, opening and closing bracket positions, plus the substring enclosed. These were by 'mathguy' and 'Peter vd Zwan', and I made very minor tweaks for consistency. I also wrote a query myself using PL/SQL in an inline SQL function using the new (v12.1) 'WITH Function' functionality, and copied this to a version using a pipelined database function to check for any performance differences. The four queries tested were then:

CBL_QRY, mathguy: Connect By, Analytics, Regex

MRB_QRY, Peter vd Zwan: Connect By, Match_Recognize

WFB_QRY, me: With PL/SQL Function, Arrays

PFB_QRY, me: Pipelined PL/SQL Function, Arrays

Bracket Pair Definition

Consider a function (BrDiff) defined at each character position as the difference between the number of opening and closing brackets to the left of, or at, that position.

A closing bracket closes an opening bracket if it is the first closing bracket where BrDiff is 1 less than BrDiff at the opening bracket. If all brackets are in some pair, then the expression can be considered well-formed.

This can be illustrated with a diagram for the fourth functional example below.

Test Problem

BRACKET_STRINGS Table

CREATE TABLE bracket_strings (id NUMBER, str VARCHAR2(4000))
/

Functional Test Data

I took four of mathguy's test records, excluding the (deliberately) badly-formed strings, and which included some embedded returns:

Each test set consisted of 100 records with the str column containing the brackets expression dependent on width (w) and depth (d) parameters, as follows:

Each str column contains w bracket pairs

The str column begins with a 3-character record number

After the record number, the str column begins with d opening brackets with 3 characters of text, like: '(001', etc., followed by the d closing brackets, then the remaining w-d pairs in an unnested sequence, like '(001)'

When w=d the pairs are fully nested, and when d=0 there is no nesting, just a sequence of '(123)' stringss.

This choice of test data sets allows us to see if both number of brackets, and bracket nesting have any effect on performance.

The output from the test queries therefore consists of 100*w records with a record identifier and a bracketed string. For performance testing purposes the benchmarking framework writes the results to a file in csv format, while counting only the query steps in the query timing results.

All the queries showed strong time correlation with width, with smaller correlation with depth.

CBL_QRY is significantly faster than MRB_QRY, which appears to be rising quadratically

Both WFB_QRY and PFB_QRY are much faster than the non-PL/SQL queries

PFB_QRY is slightly faster than WFB_QRY. This could be regarded as too small a difference to be significant, but is consistent across the data points, despite the context switching with database function calls