Question

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,

It's best I explain the complication i'm having with an example:

Here is a transactional dataset:

t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes

The minsup for the above is 0.5 or 50%.

Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:

F1:

Milk = 4
Chicken = 4
Beer = 5
Cheese = 4

I then created my candidates for the second refinement (C2) and narrowed it down to:

F2:

{Milk, Beer} = 4

This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:

Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence

It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?

Was it helpful?

Solution

Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1 will not be considered. But if the minimal threshold is 1 then you have to.

you can take a look here and here for more ideas and examples.

Hope that I helped.

OTHER TIPS

If the minimum support threshold (minsup) is 4 / 7, then you should include single items in the set of frequent itemsets if they appear in no less than 4 transactions out of 7. So in your example, you should include them:

Milk = 4 Chicken = 4 Beer = 5 Cheese = 4

For the association rules, they have the form X ==> Y where X and Y are disjoint itemsets and it is generally assumed that X and Y are not empty sets (and this is what is assumed by Apriori). So therefore, you need at least two items to generate an association rule.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top