Question

I am trying to find the frequent pattern itemsets using Apriori in R data mining. I have set of text files which contains only names. For example:

**name1.txt**
Bob
Alice

**name2.txt**
Alice
Don

**name3.txt**
Bob
Alice
Ben

Using the frequent pattern the result would be {Bob, Alice} if the min_sup is 2. I would like to get this from R.

I know how to import a single text file and use eclat algorithm to find the frequent itemset for a single file.

fsets <- eclat(Adult, parameter = list(supp = 0.5))

My question is, how do I import multiple files which are in a folder and use it in eclat?

Thank you in advance!

Was it helpful?

Solution

Import the files as lists.

files <- lapply(seq_len(3), function(x) readLines(paste0("name",x,".txt")))

Compute intersection counts

counts <- Reduce(function(cnts, lst) {
  for(i in names(tmp <- table(as.character(unlist(lst)))))
    cnts[[i]] <- if(i %in% names(cnts)) cnts[[i]] + tmp[[i]]
                 else tmp[[i]]
  cnts
}, files, list())

Find the ones you want.

min_sup <- 2
most_frequent <- names(counts)[as.integer(unlist(counts)) >= min_sup]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top