Apriori algorithm simplified with an example

Introduction
Finding frequent item-sets is one of the most investigated fields of data mining. The Apriori algorithm is the most established algorithm for Frequent Item-sets Mining (FIM).

market-basket analysis

 Definition of Frequent Item-sets

A set of items that appears in many baskets is said to be “frequent.” To be formal, we assume there is a number s, called the support threshold. IfI is a set of items, the support for I is the number of baskets for which I is a subset. We sayI is frequent if its support is sor more.

Example:
Lets consider a simple example. Consider the transactions for the following items

Transaction ID
Items purchased
T1
{Apple, Mango,Pears}
T2
{Mango, Pears, Cabbage, Carrots}
T3
{Pears, Carrots,Mango}
T4
{Carrots, Mango}

Next consider the rule that item/itemset is frequently purchased if it occurs at least 50% of the times. So here it should be bought at least 2 times.

For simplicity, lets abbreviate the items as follows;
Apple-A
Mango-M
Pears-P
Cabbage-Ca
Carrots-Cr

So the table now becomes

Table 1
Transaction ID
Items purchased
T1
{A,M,P}
T2
{M,P,Ca,Cr}
T3
{P,Cr,M}
T4
{Ca,M}
 
Step 1: Count the number of transactions in which each item occurs
Item
Number of Transactions
A
1
M
3
P
3
Ca
2
Cr
2

Step 2: Now remove all the items that are purchased less than 2 times. So the new table becomes

Item
Number of Transactions
M
3
P
3
Ca
2
Cr
2
Step 3: Start making pairs of the items from step 2 with each other
Items
MP
MCa
MCr
PCa
PCr
CaCr
Note: Itemset PM,CaP, CrP are the same as MP, PCa,PCr so they are not included in step 3.

Step 4: Now we count how many times each pair as shown in Step 3 occurs in Table 1.

Items
Number of transactions
MP
2
MCa
2
MCr
2
PCa
1
PCr
1
CaCr
1
Step 5: Look at the question- it states that consider the itemset that is purchased at least 2 times or 50% of the times. Applying this rule on Step 4 will reduce the table to the following;
Items
Number of transactions
MP
2
MCa
2
MCr
2
So this table shows that the following items MP (Mango and Pears), MCa (Mango and Cabbage) and MCr (Mango and Carrots) are purchased together at least 50% of the times.
In the next post, i will be providing a Java based implementation of Apriori Algorithm
Advertisements