An Introduction to Sequential Rule Mining

In this blog post, I will discuss an interesting topic in data mining, which is the topic of sequential rule mining. It consists of discovering rules in sequences. This data mining task has many applications for example for analyzing the behavior of customers in supermarkets or users on a website.

Introduction

Before, discussing this topic, let me talk a little bit about the context. There has been a lot of work in the field of data mining about pattern mining . The goal of pattern mining is to discover useful, novel and/or unexpected patterns in databases. In this blog post, we will be interested by a specific type of database called sequences databases. A sequence database contains some sequences. For example, consider the following database:

A sequence database containing four sequences

This database contains four sequences named seq1, seq2, seq3 and seq4. For our example, consider that the symbols “a”, “b”, “c”, d”, “e”, “f”, “g” and “h” respectively represents some items sold in a supermarket. For example, “a” could represent an “apple”, “b” could be some “bread”, etc.

Now, a sequence is an ordered list of sets of items. For our example, we will assume that each sequence represents what a customer has bought in our supermarket over time. For example, consider the second sequence “seq2”. This sequence indicates that the second customer bought items “a” and “d” together, than bought item “c”, then bought “b”, and then bought “a”, “b”, “e” and “f” together.

Sequences are a very common type of data structures that can be found in many domains such as bioinformatics (DNA sequence), sequences of clicks on websites, the behavior of learners in e-learning, sequences of what customers buy in retail stores, sentences in a text, etc.

Discovering sequential patterns in sequences

An important data mining problem is to design algorithm for discovering hidden patterns in sequences. There have been a lot of research on this topic in the field of data mining and various algorithms have been proposed.

In the following, Iwill discuss two types of patterns that can be found. Iwill first discuss sequential patterns. Then, I will explain some of their limitations and then discuss sequential rules.

A sequential pattern is a subsequence that appear in several sequences of a database. For example, the sequential pattern <{a}{c}{e}> appears in the two first sequences of our database. This pattern is quite interesting. It indicates that customers who bought {a}, often bought {c} after, followed by buying {e}. Such a pattern is said to have a support of two sequences because it appears in two sequences from the database. Several algorithms have been proposed for finding all sequential patterns in a database such as CM-SPADE, PrefixSpan and GSP. These algorithms takes as input a sequence database and a minimum support threshold (minsup). Then, they will output all sequential patterns having a support no less than minsup. Those patterns are said to be the frequent sequential patterns.

For example, for the above example, if we run CM-SPADE with minsup = 3, we will find the following frequent sequential patterns:

<{a}> with a support of 3 sequences
<{a},{e}> with a support of 3 sequences
<{a},{f}> with a support of 3 sequences
<{b},{e}> with a support of 3 sequences
<{b},{f}> with a support of 4 sequences

Sequential patterns can be quite interesting. In the example, we can learn that buying item “b” is followed by buying item “e” in 3 sequences. However, sequential patterns can be misleading. An important limitation of sequential patterns is that there is no assessment of the probability that a pattern will be followed. Let me explain this in more details. For example, if we consider again the pattern <{b},{e}>. This pattern is said to appear in 3 sequences. It may thus seems likely that if someone buy “b”, he will also buy “e” after. But how likely? We can observe that item “b” appears in four sequences. Thus, the probability that “e” appears after “b” is actually 3 / 4 = 75 % (i.e. P(e|b)= 75%). But sequential patterns only indicate how often the pattern appears. They do not provide any indication about this probability.

Discovering sequential rules in sequences

This now lead us to the main topic of this post which is sequential rule mining. Sequential rule mining has been proposed as an alternative to sequential pattern mining to take into account the probability that a pattern will be followed. I will provide a few definitions and then we will look at a full example.

A sequential rule is a rule of the form X -> Y where X and Y are sets of items (itemsets). A rule X ->Y is interpreted as if items in X occurs (in any order), then it will be followed by the items in Y (in any order). For example, consider the rule {a} -> {e,f}. It means that if a customer buy item “a”, then the customer will later buy the items “e” and “f”. But the order among items in {e,f} is not important. This means that a customer may buy “e” before “f” or “f” before “e”.

To find sequential rules, two measures are generally used: the support and the confidence. The support of a rule X -> Y is how many sequences contains the items from X followed by the items from Y. For example, the support of the rule {a} -> {e,f} is 3 sequences because {a} appears before the items from {e,f} in three sequences (seq1, seq2 and seq3).

The confidence of a rule X -> Y is the support of the rule divided by the number of sequences containing the items from X. It can be understood as the conditional probability P(Y|X). For example, the confidence of the rule {a} -> {e,f} is 1 (or 100 % if written as a precentage), because every time that a customer buy item “a”, he then buy “e” and “f” in the example database. Another example is the rule {a} -> {b}. This rule has a support of 2 sequences and a confidence of 0.66 (that is 66%).

A sequential rule mining algorithm such as RuleGrowth, ERMiner and CMRules will output all sequential rules having a support and a confidence respectively no less than some thresholds minsup and minconf set by the user. For example, consider again the example database and suppose that the user set minsup = 2 and minconf = 60%. The following rules are found by RuleGrowth:

These rules can be viewed as more interesting than sequential patterns since they give a measure of confidence that they will be followed. For example, it is very informative to know that some rules such as {c} -> {e,f} have a confidence of 100 %.

In the past, I have carried a study with my student to compare the prediction accuracy of sequential patterns and sequential rules. In that study, we found sequential rules can provide a much higher prediction accuracy than sequential patterns when the patterns are used for sequence prediction. The reason is that sequential rules consider the probability (confidence), while sequential patterns do not.

Extensions of the task of sequential rule mining

In the previous paragraphs, I have introduced the topic of sequential rule mining. But note there also exists several extensions of the problem of sequential rule mining. These extensions have been proposed to address specific needs. I will provide a brief overview of a few extensions.

Discovering thetop-k sequential rules. The idea is to discover the k most frequent rules in a dataset having at least a confidence no less than minconf. For example, a user may specify that he wants to find the top 1000 rules having a confidence of at least 75 %. Some algorithms for this task are TopSeqRules and TNS.

Discovering sequential rules with a window size constraint. This algorithm let the user find rules of the form X -> Y where X and Y must be close to each other with respect to time. For example, a user may want to find rules appearing whithin three consecutive itemsets in sequences. This is interesting for example for analyzing sequence of web clicks. An algorithm for this task is TRuleGrowth.

Discovering high-utility sequential rules. Another extension is to discover rules where items may be annotated with quantities in sequences and each item may have a unit profit. For example, we may have a sequence where a customer bought three breads, then two apples and two bottle of milk and these items may have some unit profit of 1$, 2$ and 1.50$. The goal of high-utility sequential rule mining is to find rules that generate a high profit and have a high confidence (high-utility rules). An algorithm for this task is HUSRM.

Open-source implementations and datasets

There exists several algorithms for sequential rule mining and sequential pattern mining that have been proposed. Java implementations of the state-of-the art algorithms are currently offered in my open-source data mining library named SPMF.

It offers several state-of-the-art algorithms for sequential rule mining such as ERMiner (2014), TNS (2013), RuleGrowth (2011), TopSeqRules (2011), and CMRules (2010). Besides, SPMF offers several algorithms for sequential pattern mining such as CM-SPADE (2014), VMSP (2014), LAPIN (2005) and PrefixSpan (2004). To our knowledge, ERMiner is the fastest sequential rule mining algorithm. But RuleGrowth is still quite fast and consumes less memory. You can try the above algorithms by going to the SPMF website. On the website, you will find instructions about how to run algorithms and some datasets on the dataset page.

Applications of sequential rule mining

Some example of applications of sequential rule mining are e-learning, manufacturing simulation, quality control, web page prefetching, anti-pattern detection in service based systems, embedded systems, alarm sequence analysis, restaurant recommendation. For example, here are a few papers describing such applications:

In this blog post, I have given an overview of the tasks of sequential rule mining and sequential pattern mining, which aims at discovering patterns in sequences. Hope that you enjoyed reading it 😉 For researchers, there are many possibility of research on this topic.

By the way, if you want to know more about the topic of sequential pattern mining, I wrote a good survey that is easy-to-read on this topic: A Survey of Sequential Pattern Mining. It gives a good introduction and overview of this topic.

Hello, Glad you like the post. Sometimes the terms may have different meaning. But for me “Sequence mining” is more general. It means to discover patterns in sequences. But it does not say what kind of patterns. In a sequence, we could find many kind of patterns such as “sequential patterns”, “sequential rules”, “periodic patterns”, etc. So, sequential rule mining is one task that you could call sequence mining, but there are also other tasks. Besides, some people will also discover patterns in 1 sequence, while other will find patterns that are common to multiple sequences. There are in fact many variations of these tasks. For example, if you do sequential rule mining in a single sequence, it is usually called episode rule mining instead of sequential rule mining, but you could still call it sequence mining.

Thanks for the great post. I got one question, often a user will interested in both the “frequent sequential patterns” and “sequential rules”. Among “the sequential rule mining algorithm such as RuleGrowth, ERMiner and CMRules”, is there an algorithm that can generate “sequential rule” from “frequent sequential patterns”, just as we may extract association rule from frequent itemsets. Thanks a lot.

Hello,
Thanks. Glad you like the post. There is some algorithms such as RuleGen by Zaki (2001) that extract sequential patterns and use them to generate sequential rules. They will find some rules of the form X–> Y where X and Y are sequential patterns. But in my opinion these rules are not very good in practice, from my experience, as they are too precise and thus not noise tolerant. For example, a rule A,b,c -> B,D,E,F would be seen as different than A,b,c –> B,D,F,E or ACB –> BDEF or any variations. But in practice they are probably the same rule. So this creates a lot of problems for prediction because these rules are seen as different and thus the confidence and support may not be calculated correctly because of that. In my TKDE paper and related ADMA paper about RuleGrowth, I have shown that it is better to use rules of the form X–> Y where X is unordered and Y is unordered, but X must appear before Y, if we want to do prediction. This kind of rule is more general, and can work better in practice for prediction. At least, that is the results that I have found for prediction for webpage prefetching with some real data.

So in my TKDE paper about RuleGrowth, and the papers about ERMiner and CMRules, I have used the definition which is not based on sequential patterns. They find the rules directly. For the type of rules found by these algorithms, it is more efficient to find the rules directly rather than to use sequential patterns.

By the way, sequential rule mining is a topic that has not been explored a lot. There are many possibilities for research. If you want more details, you can e-mail me directly and I can give you some suggestions.

Thanks for a great article. This is what I was exactly looking for. As you have already mentioned in your paper, there was lack of opensource implementations for sequences mining. I am looking for Python based implementation for sequence rule mining to cluster similar behaviors and identify anomalies if a new sequence is detected.
Could you please let me know if you have any idea on the same?

Hello,
You are welcome. Glad you like the article. Looks like an interesting topic to use sequential rules for clustering!

I personally do not program in Python, so I am not familiar with the libraries that are available in that language but I did not hear about such libraries either. I think that the easiest would be to call the SPMF library from your Python code. Also, the SPMF library can be used from the command line. So it could be called easily from a Python program.

By the way, I remember that there is a researcher in Canada, who implemented my ERMiner algorithm for sequential rule mining in Python. Send me an e-mail at philfv8 AT yahoo.com and I will give you his e-mail. Maybe he can share the code with you.

So after reading your post, I realize that sequential rule mining task is different than my intuition led me to believe. It seems like you lose some of the sequential nature inherent in the data with sequential rules.

For example, suppose you have a dataset where the sequential rule {a} -> {x,y} has 100% support (and hence, 100% confidence). This rule can be realized in the dataset in one of three ways:
S1 =
S2 =
S3 =
It could be the case that
sup(S1)=100%, conf(S1)=100%
sup(S2)=0%, conf(S2)=0%
sup(S3)=0%, conf(S3)=0%
and so we conclude that x must occur before y. On the other hand, if it is the case that
sup(S1)=33%, conf(S1)=33%
sup(S2)=33%, conf(S2)=33%
sup(S3)=33%, conf(S3)=33%
then the order of x and y appearing after a does not matter.

Does the task of mining patters of the from X -> Y where X and Y are sequences (e.g., -> ) have a name?

Something appear missing in your comment. I cannot read the S1, S2 and S3 of your example. But I think that I understand your point, so I will provide some comment on that below.

Generally, a rule is of the form X –> Y where something called X must appear before something called Y. X is the antecedent and Y the consequent.
Now, to be more specific, there are two main types of sequential rules in the literature:
– There is the one that I call “standard sequential rules” (SSR), where X is a sequence and Y is a sequence. For example, a rule a,b,c –> d,e means that a is followed by b, which is followed by b, and then c, which implies that it will be followed by d, and then e.
– There is another type of rules that I call “partially ordered sequential rules” (POSR). In these rules X is an unordered set of items, and Y is also an unordered set of items. But items in X must appear before those of Y. Thus, a rule a,b,c –> d,e, here means that a,b,c can appear in any order, but are then followed by d and e, in any order.

In my ADMA 2012 paper, I did some experiment to compare POSR and SSR for the task of webpage prediction to see which one is better:

It was found in that study that the POSR were more accurate than the SSR. The difference was sometimes more than 20 % more accuracy using the POSR. The reason is that SSR rules are too specific. A very small difference in the order of items will imply that several rules will be seen as different and their support or confidence may thus be calculated incorrectly. For example, the rules:
a,b,c -> d,e
a,c,b –> d,e
b,c,a –> d,e
b,a,c –> d,e
a,b,c –> e,d
a,c,b, –> e,d
b,c,a, –> e,d
b,a,c, –> e,d
…
and so on would all be seen as different if we use the definition of SSR.
But in real life, these rules may actually represent the same thing but be slightly different because of noise or small variations. Considering these rules as different will greatly decrease the support of each of these rules, and may result in dozen of rules having a low support, and these rules may have confidence values that are largely different although they may represent the same thing.
If instead we use the POSR, we remove some constraint on the order and get a single more general rules instead of a dozen rules:
a,b,c -> d,e
This rules replaces all the above rules. Thus, it will also reduce the number of rules found. Actually, in pattern mining, a problem is always that there can be millions of patterns. So in my opinion it is good to generalize all these specific rules as one rule. This can greatly reduce the number of patterns to analyze. This rule will also be more noise tolerant.

I have also argued about that in my TKDE paper about POSR. You can read the two first pages of the introduction that argue that POSR are more appropriate then SSR:

Also, to my knowledge, POSR have more applications. They have been used in several papers, while I did not see much about SSR for applications.

But of course, SSR could also be more appropriate than POSR for some domains, including what you are doing! There is never something that is always best. And all of these definitions could be modified for some specific applications if other constraints would need to be handled.

So after reading your post, I realize that sequential rule mining task is different than my intuition led me to believe. It seems like you lose some of the sequential nature inherent in the data with sequential rules.

For example, suppose you have a dataset where the sequential rule {a} -> {x,y} has 100% support (and hence, 100% confidence). This rule can be realized in the dataset in one of three ways:

Hi, Glad you like the post. For the prediction using sequential rules, the main idea is not very complicated.
I will explain this with an example. Let’s say that we have sequences of webpages visited by some users on a website.
Having these training sequences, we can extract some sequential rules from them. We could find a rule such as:
{a} –> {b} support: 50% confidence: 90 % This means that when a user visit the webpage {a} 90 % of the time, he will visit the webpage {b} after.
Then, let say that we have a new user who visits the following webpages in that order e,c,a Then, we can try to match the different rules that we have with that sequence to predict what the user will visit next. To do so, we compare each rule with the sequence to see which rule can be used to make a prediction. For example, we can find that the rule {a} –> {b} match with that sequence because the left side of the rule “a” appears in e,c,a. Thus we can predict that the right side of the rule {a}–> {b} will be the next page visited by the user, that is the webpage {b}.
If several rules match with a sequence, we can use the support and confidence to decide which prediction is more likely, or we can add other criteria. This is the basic idea. We have some training sequences. We extract some rules. Then we try to match the rules with th sequences to see what will happen next.

Hello,
Yes, there exists some like RuleGen, which is implemented in SPMF, and others which I have not implemented. But an issue with this kind of rules is that many rules that are only slightly different will be viewed as different rules. Thus, it will decrease the support of all the rules, and some very similar rules may have very different support and confidence. But maybe that it is what you need for your application.