Subscribe to this blog

Follow by Email

MATCH_RECOGNIZE and the Optimizer

If you have already been working with the new 12c pattern matching feature you will have probably spotted some new keywords appearing in your explain plans. Essentially there are four new keywords that you need to be aware of:

MATCH RECOGNIZE

SORT

BUFFER

DETERMINISTIC FINITE AUTO

The fist three bullet points are reasonably obvious (at least I hope they are!) but just incase…. the keywords MATCH RECOGNIZE refers to the row source for evaluating the match_recognize clause . The “SORT keyword means the row source sorts the data data before running it through the state machine to find the matches.

The last keyword is the most interesting and is linked to the use of “state machine”, as mentioned in the previous sentence. Its appearance or lack of appearance affects the performance of your pattern matching query. The importance of this keyword is based on the way that pattern matching is performed. To search for a pattern containing a specific set of events we build something called a “state-machine”. At this point I will turn to Wikipedia to provide a definition of a state machine:

…a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition…

The classic example of a state machine is a traffic light which moves through a given sequence of events in a set order and always in that order: Red -> Red & Yellow -> Green -> Yellow. The traffic light model can also be viewed as at “deterministic” state machine. This implies that for every state there is exactly one transition for a given input, i.e. it is not possible to have two different transitions leading out of a particular state. With our traffic light state model it is clear that given a red light state there is only one next transition which is to green & amber.

Let’s use our normal stock ticker sample schema that tracks stock market prices for three ticker symbols. Let’s look at two very similar pattern matching queries:

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

AFTER MATCH SKIP TO LAST UP

PATTERN (STRT DOWN* UP*)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

AFTER MATCH SKIP TO LAST UP

PATTERN (STRT DOWN UP)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

Note that the key difference between the two sql statements is the PATTERN clause. The statement on the left checks for zero or more instances of two different events: 1) where the price in the current row is less then the price in the previous row and 2) where the price in the current row is more then the price in the previous row. The statement on the right checks for only once instance of each down-up pattern. This difference in the definition of the pattern results in different explain plans where the plan on the right includes the key phrase “DETERMINISTIC FINITE AUTO” .

The phrase “DETERMINISTIC FINITE AUTO” means that the state machine that we constructed is deterministic and thus when running the sorted rows through the state machine, we don’t do backtracking (I will write a separate blog post on this topic very soon as it is a key concept in pattern matching. For the moment I will simply point you to Wikipedia page on backtracking, personally I found the section headed “Description of the method” the most useful). The key benefit of building a “DETERMINISTIC FINITE AUTO” plan is that the execution is more efficient when there is no backtracking.

When we analyze the PATTERN clause and build the corresponding state machine we are able to detect deterministic finite automaton by checking the state machine. If any state has two or more outgoing transitions then we regard the state machine as non-deterministic, if any final state is followed by a non-final state, then the state machine is regarded as non-deterministic. At the moment we can only detect a few trivial cases such as PATTERN (A B C), PATTERN (A B+), PATTERN (A B*), etc.

The first example of these patterns that we can detect is shown above (see the statement on the right where we have STRTDOWNUP pattern) and the other two examples of these types of deterministic patterns are shown below:

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

PATTERN (STRT DOWN UP+)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

PATTERN (STRT DOWN UP*)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

For PATTERN (A | B) , or PATTERN (A B+ C) we just regard the state machine as non-deterministic, therefore, the explain plans only contain the keywords MATCH RECOGNIZE (SORT) as shown below:

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

PATTERN (STRT | DOWN | UP)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

SELECT *

FROM Ticker

MATCH_RECOGNIZE (

PARTITION BY symbol ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(DOWN.tstamp) AS bottom_tstamp,

LAST(UP.tstamp) AS end_tstamp

ONE ROW PER MATCH

PATTERN (STRT DOWN* UP)

DEFINE

DOWN AS DOWN.price < PREV(DOWN.price),

UP AS UP.price > PREV(UP.price)

) MR

WHERE symbol='ACME'

ORDER BY MR.symbol, MR.start_tstamp;

Within the current version of 12c (12.1.2) we are not checking the mutual exclusiveness of the DEFINE predicates in detecting a deterministic state machine, therefore, the execution plan defaults to a MATCH RECOGNIZE (SORT) style plan, where we may or may have to use backtracking. Obviously, as we continue to develop the MATCH_RECOGNIZE feature will expand our ability to detect a deterministic state machine which means we will process your patter more efficiently.

In summary, if you want the most efficient execution plan then try to define your pattern in such way that we are able to create a deterministic state machine. This assumes, of course, that backtracking is not needed within each partition/data set in order to identify the required pattern (more on this in my next blog post).

Hope this information is useful. If you have any questions then feel free to contact me directly (keith.laker@oracle.com).

Comments

Post a Comment

Popular posts from this blog

Oracle RDBMS 11gR2 introduced the LISTAGG function for working with string values. It can be used to aggregate values from groups of rows and return a concatenated string where the values are typically separated by a comma or semi-colon - you can determine this yourself within the code by supplying your own separator symbol.

Based on the number of posts across various forums and blogs, it is widely used by developers. However, there is one key issue that has been highlighted by many people: when using LISTAGG on data sets that contain very large strings it is possible to create a list that is too long. This causes the following overflow error to be generated:ORA-01489: result of string concatenation is too long.
Rather annoyingly for developers and DBAs, it is very difficult to determine ahead of time if the concatenation of the values within the specified LISTAGG measure_expr will cause an ORA-01489 error. Many people have posted workarounds to resolve this problem - including mysel…

This post covers one of the new SQL performance enhancements that we incorporated into Database 12c Release 2. All of these enhancements are completely automatic, i.e. transparent to the calling app/developer code/query. These features are enabled by default because who doesn’t want their queries running faster with zero code changes?

So in this post I am going to focus on the new In-Memory “cursor duration” temporary table feature. Let’s start by looking at cursor duration temp tables…Above image courtesy of wikimedia.org
What is a cursor duration temp table?
This is a feature that has been around for quite a long time. Cursor duration temporary tables (CDTs) are used to materialize intermediate results of a query to improve performance or to support complex multi step query execution. The following types of queries commonly use cursor duration temp tables:WITH Clause and parallel recursive WITHGrouping SetsStar TransformationFrequent Item Set CountingXLATE
What happens during the …

Here it is…..THE ultimate, comprehensive review for big data warehousing for #OOW18.This review is for everyone who either missed this year’s conference or just wants to relive the amazing experience all over again (but focusing on just the best bits obviously!). So here you go, my complete view nicely packaged and available free of charge in a number of different formats…

Following on from this year’s OpenWorld I have now put together the ultimate, comprehensive review for big data warehousing content from #OOW18. This free review contains the following information:Key video highlights from the main executive keynotesOverview of the announcements for Autonomous Database - contains the links you need to learn even more about how the Autonomous Database can simplify and speed up your big data warehousing projects!Full list of Oracle Product Management and Development presenters, links to all their social media sites are included alongside each profile.All the downloadable content from t…

Keith Laker

Keith Laker

Disclaimer

Opinions expressed are entirely my own and do not reflect the position of Oracle or any other corporation. Do NOT take anything written here, unless explicitly mentioned otherwise, to be Oracle policy or reflecting Oracle's support policy.

About Me

I have been working with Oracle data warehouse technology for over 20 years working on a wide variety of data warehouse projects both as a consultant and an onsite support engineer. I am now part of the Data Warehouse Product Management Team where I am responsible for analytical SQL. I am based in the UK at our Manchester office.

A key part of my role is to work with our sales teams to brief our customers on data warehousing and analytical SQL: explaining the wide variety of new and exciting opportunities that our DW and analytical solutions can support.

I regularly deliver sales training for data warehousing and analytical SQL across all our sales regions and provide competitive intelligence support across all the major data warehouse vendors.