Apriori algorithm simplified with an example

Introduction
Finding frequent item-sets is one of the most investigated fields of data mining. The Apriori algorithm is the most established algorithm for Frequent Item-sets Mining (FIM). Definition of Frequent Item-sets

A set of items that appears in many baskets is said to be “frequent.” To be formal, we assume there is a number s, called the support threshold. IfI is a set of items, the support for I is the number of baskets for which I is a subset. We sayI is frequent if its support is sor more.

Example:
Lets consider a simple example. Consider the transactions for the following items

 Transaction ID Items purchased T1 {Apple, Mango,Pears} T2 {Mango, Pears, Cabbage, Carrots} T3 {Pears, Carrots,Mango} T4 {Carrots, Mango}

Next consider the rule that item/itemset is frequently purchased if it occurs at least 50% of the times. So here it should be bought at least 2 times.

For simplicity, lets abbreviate the items as follows;
Apple-A
Mango-M
Pears-P
Cabbage-Ca
Carrots-Cr

So the table now becomes

Table 1
 Transaction ID Items purchased T1 {A,M,P} T2 {M,P,Ca,Cr} T3 {P,Cr,M} T4 {Ca,M}

Step 1: Count the number of transactions in which each item occurs
 Item Number of Transactions A 1 M 3 P 3 Ca 2 Cr 2

Step 2: Now remove all the items that are purchased less than 2 times. So the new table becomes

 Item Number of Transactions M 3 P 3 Ca 2 Cr 2
Step 3: Start making pairs of the items from step 2 with each other
 Items MP MCa MCr PCa PCr CaCr
Note: Itemset PM,CaP, CrP are the same as MP, PCa,PCr so they are not included in step 3.

Step 4: Now we count how many times each pair as shown in Step 3 occurs in Table 1.

 Items Number of transactions MP 2 MCa 2 MCr 2 PCa 1 PCr 1 CaCr 1
Step 5: Look at the question- it states that consider the itemset that is purchased at least 2 times or 50% of the times. Applying this rule on Step 4 will reduce the table to the following;
 Items Number of transactions MP 2 MCa 2 MCr 2
So this table shows that the following items MP (Mango and Pears), MCa (Mango and Cabbage) and MCr (Mango and Carrots) are purchased together at least 50% of the times.
In the next post, i will be providing a Java based implementation of Apriori Algorithm