Follow by Email

SQL Pattern Matching Deep Dive - Part 5, SKIP TO where exactly?

So far in this series we looked at how to ensure query consistency, how correctly use predicates, managing sorting, using the built-in measures to help with optimise your code and the impact of different types of quantifiers:

In this post I am going to review what MATCH_RECOGNIZE does after a match has been found i.e. where the search begins for the next match. It might seem obvious, i.e. you start at the next record, but MATCH_RECOGNIZE provides a lot of flexibility in this specific area (as you would expect).

Basic Syntax

We use the AFTER MATCH SKIP clause to determine the precise point to resume row pattern matching after a non-empty match is found. If you don’t supply an AFTER MATCH SKIP clause then the default is AFTER MATCH SKIP PAST LAST ROW.

Of course there are quite a few options available:

AFTER MATCH SKIP TO NEXT ROW Resume pattern matching at the row after the first row of the current match.

AFTER MATCH SKIP PAST LAST ROW Resume pattern matching at the next row after the last row of the current match.

AFTER MATCH SKIP TO FIRST pattern_variable Resume pattern matching at the first row that is mapped to the pattern variable.

AFTER MATCH SKIP TO LAST pattern_variable Resume pattern matching at the last row that is mapped to the pattern variable.

AFTER MATCH SKIP TO pattern_variable The same as AFTER MATCH SKIP TO LAST pattern_variable.

Using Pattern Variables and ORA-62154

Note that you can set the restart point to be linked to a specific pattern variable which allows you to work with overlapping patterns - i.e. where you are searching for “shapes” within your data set such as “W” shaped patterns within our ticker data stream. But what happens if the pattern variable within the SKIP TO clause is not matched? Let’s look at the following example:

SELECT *FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES STRT.tstamp AS start_tstamp,LAST(UP.tstamp) as end_tstamp, MATCH_NUMBER() AS match_num, CLASSIFIER() AS var_match ALL ROWS PER MATCHAFTER MATCH SKIP TO DOWN PATTERN (STRT DOWN* UP) DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MRWHERE symbol='ACME'ORDER BY MR.symbol, MR.tstamp;

here we are stating that we need at least zero or more matches of the variable DOWN to occur and once a match has been found then we will resume the search for the next pattern at the DOWN event. With this pattern it is possible that DOWN will never get matched so the AFTER MATCH SKIP TO DOWN cannot happen even though a complete match for the pattern is found. Therefore, the compiler throws an error to let you know that this code will not work:

ORA-62514: AFTER MATCH SKIP TO variable is not bounded in the match found.62514. 00000 - "AFTER MATCH SKIP TO variable is not bounded in the match found."*Cause: AFTER MATCH SKIP TO variable was not bound in the match found due to pattern operators such as |, *, ?, and so on.*Action: Modify the query and retry the operation

Therefore, you need to change the pattern to search for at least one or more instances of DOWN rather than zero or more as this will allow the DOWN event to be matched at least once and therefore it will be available for AFTER MATCH SKIP TO processing.

Skipping PAST LAST ROW [DEFAULT]

This is the default behaviour and in many circumstances this is the most obvious choice. In these situations the searching for the next pattern it makes sense to resume at the row after the last match since going back over previous rows does not make any sense and would only result in more rows than necessary being processed. For example, let’s look at the sessionization example: http://oracle-big-data.blogspot.co.uk/2014/02/sessionization-with-12c-sql-pattern.html and if you want to try the code see the tutorial on the LiveSQL site.

Looking at the source data for the sessionization example it’s clear that as we walk through the entries in the log file to check if an entry is part of the current session or not, there is no point in stepping backwards to begin searching again once a match has been found.

You can run the code for this sessionization example on LiveSQL.

Looking for shapes and controlling skipping

As I previously stated, you might think the obvious position to start searching for the next occurrence of a pattern is the next record after the last row of the current match. But what if there are overlapping patterns where the middle of an earlier match overlaps with the start of the next match? For example if we are looking for a w-shaped pattern within our ticker data set then it is quite possible to have overlapping w-shapes where the next “W” starts within the second down phase of the previous ”W”.

Fortunately MATCH_RECOGNIZE provides great flexibility in terms of being able to specify the restart point. If we look at the source data for the ACME symbol within our ticker data set then we can see that there are overlapping W-shapes (assuming we allow for the flat-top in the middle of the 2nd w-shape by using the <= and >= tests for each pattern variable!).

Let’s use this example to explore the various AFTER MATCH SKIP TO options…starting with the default behaviour:

and if we expand the output, using ALL ROWS PER MATCH, so we can see how the pattern was matched we can see that it starts on 05-Apr-11 with pattern variable STRT and ends on 14-Apr-11 with pattern variable Z.

Now let’s change the above code sample so that after the first pattern has been found we begin searching at the row after the end of the matching process for the Y variable - i.e. row 6, 10-Apr-11.

now shows us that the records for 10-Apr-11 to 14-Apr-11 were actually processed twice:

Skip to next row?

What about using the SKIP TO NEXT ROW syntax? How does that affect our results? It is important to remember that this will force MATCH_RECOGNIZE to resume pattern matching at the row after the first row of the current match. Using our ticker data we can see that this would actually increase the number of W-shapes to three!

In match 2 we have two occurrences of pattern variable x, there once the second W-shape has been matched the search process restarts on row 12, i.e. the first row of the current match, which is row 12 mapped to STRT.

Note that match two, the 2nd W-shape, starts on line 11 but we began the search for this second match on row 2, i.e. the next row after the first start variable. Similarly, the search for the third W-shape on row 12 after the second STRT variable. Given that our original data set for ACME only contained 20 rows you can see from this example how it is possible to do a lot more processing when you start to fully exploit the power of the AFTER MATCH SKIP syntax.

Just accept the default?

The AFTER MATCH SKIP clause determines the point at which we will resume searching for the next match after a non-empty match has been found. The default for the clause is AFTER MATCH SKIP PAST LAST ROW: resume pattern matching at the next row after the last row of the current match. In most examples of using MATCH_RECOGNIZE you will notice that the AFTER MATCH clause is not present and the developer blindly assumes that the AFTER MATCH SKIP PAST LAST ROWclause is applied. This obviously does not help the next developer who has to amend the code to fit new business requirements.

Therefore, my recommendation is that you should always clearly state where you want the matching process to start searching for the next match. Never assume the default will behaviour will be good enough!

Summary

We are getting near the end of this series of deep dive posts. Hopefully this post has explained the ways in which you can use the AFTER MATCH SKIP… clause to ensure that you capture all of the required patterns/shapes within your data set. It’s always a good idea to explicitly include this clause because it is very important - if you don’t want to allow for overlapping matches then clearly state this in your code by using AFTER MATCH SKIP PAST LAST ROW clause. Don’t assume the default will kick-in and that the next developer will have time to read all your detailed documentation when making the next round of changes to the code.

What’s next?

In the next post in this series I am going to review the keywords that control the output from MATCH_RECOGNIZE: ALL ROWS vs. ONE ROW. Feel free to contact me if you have an interesting use cases for SQL pattern matching or if you just want some more information. Always happy to help. My email address is keith.laker@oracle.com.

Comments

Post a Comment

Popular posts from this blog

This post covers one of the new SQL performance enhancements that we incorporated into Database 12c Release 2. All of these enhancements are completely automatic, i.e. transparent to the calling app/developer code/query. These features are enabled by default because who doesn’t want their queries running faster with zero code changes?

So in this post I am going to focus on the new In-Memory “cursor duration” temporary table feature. Let’s start by looking at cursor duration temp tables…Above image courtesy of wikimedia.org
What is a cursor duration temp table?
This is a feature that has been around for quite a long time. Cursor duration temporary tables (CDTs) are used to materialize intermediate results of a query to improve performance or to support complex multi step query execution. The following types of queries commonly use cursor duration temp tables:WITH Clause and parallel recursive WITHGrouping SetsStar TransformationFrequent Item Set CountingXLATE
What happens during the …

….and now it’s here in PDF format as well!The free big data warehousing Must-See guide for OpenWorld 2017 is now available for download in PDF format - click here, and yes it’s completely free. This comprehensive guide covers everything you need to know about this year’s Oracle OpenWorld conference so that when you arrive at Moscone Conference Center you are ready to get the most out of this amazing conference. The guide contains the following information:Page 8 - On-Demand VideosPage 17 - JustifyYour tripPage 19 - KeyPresentersPage 41 - Must See SessionsPage 90 - Useful MapsChapter 1 - Introduction to the must-see guide.Chapter 2 - A guide to the key the highlights from last year’s conference so you can relive the experience or see what you missed. Catch the most important highlights from last year's OpenWorld conference with our on demand video service which covers all the major keynote sessions. Sit back and enjoy the highlights. The second section explains why you need to atte…

MATCH_RECOGNIZE and predicates
At a recent user conference I had a question about when and how predicates are applied when using MATCH_RECOGNIZE so that’s the purpose of this blog post. Will this post cover everything you will ever need to know for this topic? Probably!
Where to start….the first thing to remember is that the table listed in the FROM clause of your SELECT statement acts as the input into the MATCH_RECOGNIZE pattern matching process and this raises the question about how and where are predicates actually applied. I briefly touched on this topic in part 1 of my deep dive series on MATCH_RECOGNIZE: SQL Pattern Matching Deep Dive - Part 1.
In that first post I looked at the position of predicates within the explain plan and their impact on sorting. In this post I am going to use the built in measures (MATCH_NUMBER and CLASSIFIER) to show the impact of applying predicates to the results that are returned.
First, if you need a quick refresher course in how to use the MATCH…

Keith Laker

Keith Laker

Disclaimer

Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. Do NOT take anything written here, unless explicitly mentioned otherwise, to be Oracle policy or reflecting Oracle's support policy.

About Me

I have been working with Oracle data warehouse technology for over 20 years working on a wide variety of data warehouse projects both as a consultant and an onsite support engineer. I am now part of the Data Warehouse Product Management Team where I am responsible for analytical SQL. I am based in the UK at our Manchester office.

A key part of my role is to work with our sales teams to brief our customers on data warehousing and analytical SQL: explaining the wide variety of new and exciting opportunities that our DW and analytical solutions can support.

I regularly deliver sales training for data warehousing and analytical SQL across all our sales regions and provide competitive intelligence support across all the major data warehouse vendors.