Association Mining: What products to recommend to your customers based on historic buying patterns?

"Association Rule Mining a.k.a Market Basket Analysis extracts underlying patterns and relationships that are otherwise not so apparent. The co-occurrences of data items can reveal inherent dependencies and establish rules of specific strength, often useful as a recommendation mechanism. Here is how you can quickly implement this.."

Measures of Association Rules

The following measures are used to evaluate the strength of association. Suppose, you are interested in the association between two events A and B:

Support = Number of Rows having both A AND B / Total Number of Rows

Confidence = Number of Rows having both A AND B / Number of Rows with A

Expected Confidence = Number of rows with B / Total Number of Rows

Lift = Confidence / Expected Confidence.

Lift is the association growth factor by which the co-occurence A AND B exceeds the expected probability when there is no relation between events A and B. In other words, higher the lift ( > 1), higher the chance of co-occurrence of B with A. # Load the libraries
library (arules)
library (arulesViz)
library (datasets)
data (Groceries) # Load the data set

By default, the class of ‘Groceries’ dataset is a ‘transactions’ type. Since ‘arules’ package is designed to work with ‘transactions’ class, it is desirable to convert your dataframe to this class. Here is how you can convert it.transDat <- as (myDataFrame, "transactions") # convert to 'transactions' class

Some Groundwork: Methods of ‘Transactions’ class dataset

inspect (transDat) # view the observations
length (transDat) # get number of observations
size (transDat) # number of items in each observation
LIST(transDat) # convert 'transactions' to a list, note the LIST in CAPS

How to Find Rules Related To Given Item/s ?

This method is the core of ‘Market basket analysis’ that is useful to make recommendations of new items to your users. This can be achieved by modifying the ‘appearance’ parameter in the apriori() function. For example,

Find what factors influenced an event ‘X’

To find out what customers had purchased before buying ‘Whole Milk’. This will help you understand the patterns that led to the purchase of ‘whole milk’.

Making Rules For Continuous Data

If you try to make rules on continuous variables, each value will be treated as distinct item, causing undesirable explosion of rules. So, convert the continuous variables to factors, which can be easily done using discretize() function.discretize (x, method="cluster", categories=3) # method can make cuts in equal "intervals", "frequency", "cluster", "fixed"