Question

Problem

I can not understand how to transform list to transactions for further processing by apriori algorithm. I have a synthetic example that works, and real (well, a subset of Foodmart database) that does not work; they look the same to me on the systems level. Please help me to transform a list to transactions object.

System setup

> version
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          0.2                         
year           2013                        
month          09                          
day            25                          
svn rev        63987                       
language       R                           
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing        

Code to replicate

Code that works

> a_list <- list(
    c("a","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

> a_trans <- as(a_list,"transactions")

> summary(a_trans)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
5 columns (items) and a density of 0.5333333 
... and so on ...
2      b
3      c

> a_rules <- apriori(a_trans)

parameter specification:
confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
... and so on ...
writing ... [17 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Code that does not work

> b_list <- list(
    c("PigTail Frozen Pepperoni Pizza","Bird Call Childrens Cold Remedy","Steady Silky Smooth Hair Conditioner","CDR Regular Coffee"),
    c("Horatio Graham Crackers","Excellent Apple Drink","Blue Medal Small Eggs","Cormorant Copper Cleaner","High Quality Copper Cleaner","Fast Apple Fruit Roll"),
    c("Toucan Canned Mixed Fruit","Landslide Salt","Gorilla Sour Cream","Hermanos Firm Tofu"),
    c("Swell Canned Mixed Fruit","Washington Diet Soda","Super Apple Jam","Plato Strawberry Preserves","Steady Whitening Toothpast","Steady Whitening Toothpast","Better Beef Soup","Hermanos Squash","Carrington Frozen Cheese Pizza","Fort West Fondue Mix","Best Choice Mini Donuts","Cormorant Copper Pot Scrubber","Ebony Cantelope","Denny D-Size Batteries","Akron Eyeglass Screwdriver"),
    c("Big Time Ice Cream Sandwich","Musial Mints","Portsmouth Imported Beer","CDR Vegetable Oil","Just Right Rice Soup","Carrington Frozen Peas","High Quality 100 Watt Lightbulb","Fort West Dried Dates"),
    c("Consolidated Tartar Control Toothpaste","Plato Tomato Sauce","Quick Seasoned Hamburger")
)

> b_trans <- as(b_list,"transactions")
Error in asMethod(object) : 
    can not coerce list with transactions with duplicated items

> summary(b_trans)
Error in summary(b_trans) : 
   error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'b_trans' not found

Funny thing

> duplicated(a_list)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE

> duplicated(b_list)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

Any ideas why this fabulous (WTF) thing happens?

Was it helpful?

Solution

joran and DWin mentioned:

  • Elements of character vectors in a_list are unique.
  • There is a duplication in one of the vectors of b_list.

How it looks like. If I add the second "b" into the first vector of a_list2

> a_list2 <- list(
    c("a","b","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

in the following attempt to transform the data I get the error

> a_trans2 <- as(a_list2,"transaction")
Error in as(a_list2, "transaction") : 
   no method or default for coercing “list” to “transaction”

It appears that b_list has "Steady Whitening Toothpast" mentioned twice in the fourth vector. Manual removal of this duplication solved the issue.

> b_trans2 <- as(b_list2,"transactions")
> summary(b_trans2)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
... and so on ...
2    Best Choice Mini Donuts
3           Better Beef Soup

Speaking about the solution for the real data processing, the following code delivers no errors.

aggrData <- split(selData$product_name,selData$transaction_id)

listData <- list()
for (i in 1:length(aggrData)) {
    listData[[i]] <- as.character(aggrData[[i]][!duplicated(aggrData[[i]])])
}

trnsData <- as(listData,"transactions")

Though, the following line nor attempts with other parameters deliver no rules.

> rules <- apriori(trnsData)

parameter specification:
... and so on ...
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Yet this is a totally different story.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top