Monthly Archives: October 2012

The rules of this puzzle are simple. Cells can be in one of three states:

An UNSHADED (white) cell means that you have not considered this cell yet.
A DARK GREY SHADED cell means that the sum of the dark grey shaded cells in that row and column must equal the number in that cell.
A LIGHT GREY SHADED cell means that the sum of the dark grey shaded cells in all the connected cells must equal the number in that cell.

Many real world problems can be formulated as Linear Programming problems. There are often many different ways to formulate a single problem. Some of these alternative formulations can easily be proven to be equal via simple algebra and arithmetic. For example, one person may see a problem as maximizing profit while another may see the same problem as minimizing losses. The relationship between the two alternative formulations can then be shown to be that one simply has the negative objective function value of the other.

Sometimes, though, this relationship between alternative formulations is not as easy to detect. Two alternative formulations that arise regularly in linear programming problems are a primal problem and a dual problem. The dual of a linear programming problem is the problem of finding the best bound on the objective function in terms of the constraints. This dual is formulated so that every feasible solution to the dual is a bound on the primal objective function. The Weak Duality Theorem for Linear Programming says that the optimal solution for a minimization problem is always an upper bound for its dual (which is a corresponding maximization problem). Correspondingly, the optimal solution for a maximization problem is always a lower bound for its dual. This leads to the Strong Duality Theorem for Linear Programming which says that if we can find feasible solutions to the primal and dual problems with matching objective function values, then both of these solutions must be optimal for their respective problems.

This shows an importance of duality. What my script provides is a means of formulating the dual of a given problem to help understand the concept.

When we are given a large set of transactions, we are often interested in discovering patterns inside these transactions. The Apriori algorithm provides a means for formulating what are known as “association rules” for the set of transactions. An association rule is an observation from the database between different items inside a transaction. For example, the statement “If a customer buys chips they are 60% more likely to also purchase dip” could be an association rule based on data from supermarket purchases. The number 60% is called the confidence we have in the rule and we are generally interested in rules with higher confidence.

The Apriori algorithm takes as input a transaction database and a “threshold”. The initial pass through the database performs a count on each single item in the database and checks how many transactions contain each item. The algorithm proceeds by the Apriori property which states that “any subset of a frequent itemset must also be frequent”. What this means is that when checking the subsets of length 2 (and greater), we can ignore those subsets that contain an element that does not meet the minimum support, as it cannot be a part of a frequent itemset.

So lets look at an example. Suppose our list of transactions for the items Chips, Dip, Soda, Napkins and Paper Plates are as follows:

And lets suppose that we’re interested in finding the collections that occur more than a quarter (25%) of the time.

What we see from simply summing the columns and dividing by the number of columns is that 60% of the people purchased chips, 45% purchased dip, 50% purchased soda, 45% purchased napkins, and 40% purchased paper plates. Since these are all above our minimum threshold, they are all possible as elements of future collections.

When looking at larger collections, we see that:
– 35% of the people purchased both chips and dip
– 35% of the people purchased both chips and soda
– 25% of the people purchased both chips and napkins
– 35% of the people purchased both dip and soda
– 25% of the people purchased both soda and paper plates
– no other size two collections are above the minimum threshold.

You’ll note that since only 20% of the people purchased chips and paper plates, it is not above the minimum threshold and so we can ignore it in future collections. From the set of size 2 itemsets, we can derive our list of size three itemsets.

In particular, we see that:
– 25% of the people purchased all three of chips, dip, and soda.
– no other size three collections are above the minimum threshold.

The hierarchy of the itemsets is displayed in the image above.

To play around with this algorithm and understand more of its properties, visit Apriori algorithm.