Objectives

The objectives of this section are:
to explain the problem of frequent itemset generation
to introduce the main strategies for solving this problem
to explain the Apriori Principle
to explain the Apriori Algorithm and it's various parts

Outcomes

By the time you have completed this section you will be able to:list the main strategies for solving the frequent itemset generation problem
describe the Apriori Principle
explain the major steps of the Apriori Algorithm
list the requirements for efficient Candidate Generation
distinguish between the methods for efficient Candidate Generation

Frequent Itemset Generation

In order to generate a frequent itemset list one must avoid using the brute force approach described in the previous section because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies used to address this problem are :-

Reduce the number of candidates: use pruning techniques such as the Apriori principle to eliminate some of the candidate itemsets without counting their support values

Reduce the number of transactions: by combining transactions together we can reduce the total number of transactions

Reduce the number of comparisons: use efficient data structures to store the candidates thereby eliminating the need to match every candidate against every transaction.

The Apriori Principle

States that if an itemset is frequent, then all of its subsets must also be frequent. This principle holds true because of the anti-monotone property of support.

This means that when f is anti-monotone (or downward closed) and X is a subset of Y, then the support of X, s(X) must not exceed the support of Y, s(Y).
The converse is also true for when f is monotone (or upward closed), this means that when X is a subset of Y, then the support of Y, s(Y) must not exceed the support of X, s(X).