Match_recognise – 2

In my previous post I presented a warning about the potential cost of sorting and the cost of failing to find a match after each pass of a long search. In a comment on that post Stew Ashton reminded me that the cost of repeatedly trying to find a match starting from “the next row down” could be less of a threat than the cost of “back-tracking” before moving to the next row down.

Taking the example from the previous posting to explain – the requirement was for customers who had executed a transaction in September but not October, and a match_recognize() clause suggested on the ODC (formerly OTN) database forum to implement this requirement was as follows:

In the blog post I raised the issue of an extreme case where there were 100,000 transactions for a single customer of which all but the last was a September transaction and the last was an October transaction. This would have two effects – first that we could have to sort 100,000 rows, including the cust_id that appeared in the “partition by” clause and the 1000-character padding column that was listed in the measures clause, leading to a huge dump to, and constant re-read of, the temporary tablespace; secondly that having failed to find a match starting from row 1 Oracle would go back to row 2 and try again, then to row 3, and so on.

The point that Stew Ashton made was that Oracle doesn’t just “go back to” row 2, it will be unwinding a stack, or reversing out a recursive descent to get there. What this means is that Oracle will fail as it reaches the October at row 100,000 and say “no more X rows, is this a Y row ? no”, backtrack to row 999,998 and say “what if I stop collecting X rows here and start looking for Y rows?”, so it reads row 999,999 as a Y row (since 9 != 10), then finds row 100,000 and fails the match. So it backtracks again to row 999,997 and says “what if I stop collecting X rows here and start looking for Y rows?”, and this time it finds identifies 999,998 and 999,999 as Y rows, then fails on row 100,000.

Remember, this is still part of the attempt to match the pattern starting at row 1 – and there are 999,996 more steps backwards still to go, and the further we go back the further we come forward again until we fail — and there are 999,998 steps we have to back-track before we start to consider a pattern starting are row 2..

To demonstrate the costs I’ve got three variants of the original query. First, the query as it was but limited to just 1,000 rows for a single customer; second a changed pattern that highlights the cost of trying to use back-tracking to match the pattern just once, starting from row 1 (the pattern doesn’t actually meet the original requirement because it would only find customers whose first transaction of the year was in September); finally a changed pattern that achieves the required result much more efficiently than the original (but still very slowly) by adding some human intelligence to the implementation.

Here’s version 1 – which took 257 CPU seconds to handle just 1,000 rows:

You’ll see that I’ve included the “debug” functions of classifier() and match_number() in the SQL above – these are particularly useful with the options “all rows per match” and “with unmatched rows” when you’re trying to figure out why your match_recognize() clause is not producing the right results, so I’ve left them there purely for reference.

Then there’s a version where I’ve made the modification suggested by Stew Ashton to demonstrate the full cost of an attempt to match only if the pattern starts on the first row of the partition. This took just 0.83 CPU seconds to complete. This may sound fairly reasonable, but if you compare that to the time it might take simply to sort and walk once through 1,000 rows you’ll realise that it’s actually pretty expensive – and it’s not surprising that when we had to do the same thing 1,000 times (on a slowly decreasing set, of course, as we work our way down the partition) the original task took 257 CPU seconds.

You’ll notice the caret “^” at the start of the pattern – this means the pattern must start at the first row of the partition (just as the “$” means the pattern has to end at the end of the partition).

Finally, thinking of a better way of using match_recognize() for this requirement we realise that we know that November comes after October, and December comes after November so (in the context of our example) the predicate “!= 10” is equivalent to “> 10”. With this code change the original query took 0.82 CPU seconds.

In this case we still have to do a lot of back tracking, but each time we backtrack one step we then check just one row forward for the match to fail (9 is not greater than 10), whereas with the original if we have backtracked 750 steps (for example) we would then have to check 750 rows before we reached the October row for the match to fail.

Bottom line: back-tracking is a massive cost if you have to take a lot of steps backwards to the previous starting row; and you need the match to fail (or succeed) as fast as possible as you start stepping forward again.

Addendum

Since Stew Ashton had highlighted the omission in the previous blog post I passed him a copy of this post before publishing it, asking him to check whether there were any errors or omissions in the way I had described the work Oracle would do back tracking in this example. He said that he couldn’t think of anything to improve the explanation (though I will still claim responsibility for any errors, omissions, or ambiguities) and then suggested another, far more efficient, way of getting the required answer by (again) re-thinking the question before writing the code. His solution looks like this:

The pattern here simply says: for the partition find the first “X”-row, but an X-row is defined as “month is september and either there are no more rows or the next row is after October”. You’ll notice that I’ve modified the “order by” clause to put nulls first – there are none in the sample data, but if there were this change to the order would ensure that for a row where “mth = 9″ the “next(mth)” could only be null if the current row were the last in the partition.

If you imagine walking through the pattern-matching process now, you start looking at rows and keep going until you reach the first September, and each time you find a September you check the next row to see if it’s past the end of partition, or a November or December; if it is you report the current row and move to the end of the partition, if it isn’t you just walk to the next row and repeat the process – you never back-track. Effectively the workload here is simply to sort then walk non-stop through the whole list – and Oracle even tells us that we are using this optimum strategy in the execution plan:

Operation 2 – the Match Recognize Sort operation – is reported as “deterministic finite auto”, which basically means the duration of the process is predictable because Oracle knows it is a straight end to end walk with no back-tracking. This is the ideal thing to see when you try to design code using match_recognize().