Association Rule Learning and the Apriori Algorithm

Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target knows your pregnant or when you’re buying an item on Amazon.com they know what else you want to buy. The same idea extends to Pandora.com knowing what song you want to listen to next. All of these incorporate, at some level, data mining concepts and association rule algorithms.

Michael Hahsler, et al. has authored and maintains two very useful R packages relating to association rule mining: the arules package and the arulesViz package. Furthermore, Hahsler has provided two very good example articles providing details on how to use these packages in Introduction to arules and Visualizing Association Rules.

Often Association Rule Learning is used to analyze the “market-basket” for retailers. Traditionally, this simply looks at whether a person has purchased an item or not and can be seen as a binary matrix.

Association rules use the R arules library. The arulesViz add additional features for graphing and plotting the rules.

library("arules");
library("arulesViz");

For testing purposes there is a convenient way to generate random data where patterns can be mined. The random data is generated in such a way where there is correlation and has correlated items.

However, a transaction dataset will usually be available using the approach described in “Data Frames and Transactions“. The rules can then be created using the apriori function on the transaction dataset.

Once the rules have been created a researcher can then review and filter the rules down to a manageable subset. This can be done in a variety of ways using both graphs and by simply inspecting the rules.

Once again we can now subset the rules to get a visual. In these graphs we can see the two parts to an association rule: the antecedent (IF) and the consequent (THEN). These patterns are found by determining frequent patterns in the data and these are identified by the support and confidence. The support indicates how frequently the items appear in the dataset. The confidence indicates the number of times the IF/THEN statement on the data are true. These IF/THEN statements can be visualized by the following graph:

Association Rules with Consequent and Antecedent.

This code will produce many different ways to look at the graphs and can even produce 3-D graphs.

How can we automatically use the data when you have your rules?
For instance, I have my data and I have extracted the rules, and one of my customer has bought something. Is there a classic algorithm to propose him automatically different things to buy?