Encyclopedia of Information Science and Technology, Fourth Edition (10 Volumes) Now 50% Off

Take 50% off when purchasing the Encyclopedia directly through IGI Global's Online Bookstore. Plus, receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book.

InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Abstract

Owing to the progress of computer and network environments, it is easy to collect data with time information such as daily business reports, weblog data, and physiological information. This is the context in which methods of analyzing data with time information have been studied. This chapter focuses on a sequential pattern discovery method from discrete sequential data. The methods proposed by Pei et al. (2001), Srikant & Agrawal (1996), and Zaki (2001) efficiently discover the frequent patterns as characteristic patterns. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and are not a source of new knowledge for the analysts. The problem has been pointed out in connection with the discovery of associative rules. Blanchard et al. (2005), Brin et al. (1997), Silberschatz et al. (1996), and Suzuki et al. (2005) propose other criteria in order to discover other kinds of characteristic patterns. The patterns discovered by the criteria are not always frequent but are characteristic of viewpoints. The criteria may be applicable to discovery methods of sequential patterns. However, these criteria do not satisfy the Apriori property. It is difficult for the methods based on the criteria to efficiently discover the patterns. On the other hand, methods that use the background knowledge of analysts have been proposed in order to discover sequential patterns corresponding to the interests of analysts (Garofalakis et al., 1999; Pei et al., 2002; Sakurai et al., 2008b; Yen, 2005).

Background

This chapter explains basic terminology related to the discovery of sequential patterns. Sequential data is rows of item sets and a sequential pattern is a characteristic subrow extracted from the sequential data. Here, an item is an object, an action, or its evaluation in the analysis target. For example, “beer”, “diaper”, “milk”, and “snack” are items in retail business. Each item set has some items that occur at the same time, but each item set does not have multiple identical items. Formally, a sequential pattern is described as , where is an item set and is the number of the item sets included in the sequential pattern. The number is called length and the sequential pattern is called n-th sequential pattern. Also, each is described as , where is an item that satisfies the following conditions: and , and is the number of the items included in the item set . For example, ({``beer”, ``diaper”}, {``beer”, ``milk”, ``snack”}, {``diaper”, ``snack”}) is an example of the sequential pattern () in the retail business. The pattern is a third sequential pattern and is composed of three item sets: {``beer”, ``diaper”}, {``beer”, ``milk”, ``snack”}, and {``diaper”, ``snack”}. The pattern shows that a person buys “beer” and “diaper” on the first day, buys “beer”, “milk”, and “diaper” on the second day, and buys “diaper” and “snack” on the third day. The sequential pattern is depicted in Figure 1. In this figure, each circle shows an item, items separated by arrow lines show item sets, and this figure shows that an item set at the left side occurs before an item set at the right side.

Key Terms in this Chapter

Support: The support is an evaluation criterion of sequential patterns. The criterion evaluates frequency of sequential patterns. The criterion satisfies the Apriori property.

Confidence: The confidence is an evaluation criterion of sequential patterns. The criterion evaluates ratio of sequential patterns in the case that their subpatterns are given. The criterion does not satisfy the Apriori property.

Inclusion: Inclusion is a relationship between two sequences. When all item sets in one sequence are included in item sets in another sequence with given order, the former sequence is included in the latter one.

Item: An item is a minimum element included in rows of sequential data. Also, an item is a special case of sequential patterns.

Apriori Property: The Apriori property is the property showing that values of evaluation criteria of sequential patterns are smaller than or equal to those of their sequential subpatterns.

Time Constraint: A time constraint is a constraint defined between time stamps of items.

Sequential Interestingness: The sequential interestingness is an evaluation criterion of sequential patterns. The criterion evaluates ratio of sequential patterns in the case that their subpatterns with minimum frequency are given. The criterion satisfies the Apriori property.

Sequential Pattern: A sequential pattern is a characteristic subrow extracted from sequential data.