Question

I'm working with the Apriori algorithm for a while and I'm asking me about a step in candidate generation of frequent itemsets.

If I want to join two frequent 3-itemsets to a (candidate) 4-itemsets, there must be 2 items of the joining itemsets the same and the other one different.

For example I can join

{Married: Yes, Age:20, Cars:1} and {Married: Yes, Age:20, Unemployed: No}

to

{Married: Yes, Age:20, Cars:1, Unemployed: No}

But sometimes I read about this step in Apriori algorithm:

I can join two freq. itemstes from L_{k-1}, when there are lexicographically ordered first k-2 items are the same and and the last ones are different.

But when I would order my itemsets from above lexicographical, the first k-2 item wouldn't be the same and so I might not join them?!?

{Age:20, Cars:1, Married: Yes} and {Age:20, Married: Yes Unemployed: No}

I hope that I could explain my problem clearly to you!

Thanks for your help!!

Was it helpful?

Solution

Yes, you should not join them.

Let's take an example.

Let's say that at LEVEL 3, you have the frequent itemsets:

{ A, B, C} { A, B, D} { A C, D} { B, C, D} { B, F, G

Now let's say that you want to generate candidate itemsets of size 4.

Obviously, you just want to combine itemsets that have 1 different item. Otherwise the result may include itemsets with a size larger than 4. For example, if you could combine BCD and BFG the result would be BCDFG an itemset of size 5, which we don't want. So that is the reason why we only combine itemsets having a single item that is different.

Now, let me explain why we only combine itemsets having the first k-1 items that are identical. The reason is that we don't want to generate the same candidate twice.

For example, if we could combine BCD and ACD we would get ABCD . If we combine also ABC and ABD we would also get ABCD. This is not good because we would generate the same candidate twice! We don't want that! Thus, by ordering itemsets according to the lexicographical order and only combining if the first k-1 items are the same, we will avoid this problem. We would only combine ABC and ABD but we would not combine BCD and ACD. You can get the proof that it works in the Apriori paper.

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top