Question

I have a list of vectors such as:

>list

[[1]]

[1] "a" "m" "l" "s" "t" "o"

[[2]]

[1] "a" "y" "o" "t" "e"

[[3]]

[1] "n" "a" "s" "i" "d"

I want to find the matches between each of them and the remaining (i.e. between the 1st and the other 2, the 2nd and the other 2, and so on) and keep the couple with the highest number of matches. I could do it with a "for" loop and intersect by couples. For example

for (i in 2:3) { intersect(list[[1]],list[[i]]) }

and then save the output into a vector or some other structure. However, this seems so inefficient to me (given than rather than 3 I have thousands) and I am wondering if R has some built-in function to do that in a clever way.

So the question would be:

Is there a way to look for matches of one vector to a list of vectors without the explicit use of a "for" loop?

Was it helpful?

Solution

I don't believe there is a built-in function for this. The best you could try is something like:

lsts <- lapply(1:5, function(x) sample(letters, 10)) # make some data (see below)
maxcomb <- which.max(apply(combs <- combn(length(lsts), 2), 2,
  function(ix) length(intersect(lsts[[ix[1]]], lsts[[ix[2]]]))))
lsts <- lsts[combs[, maxcomb]]
# [[1]]
#  [1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"

# [[2]]
#  [1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"

A dump of the original:

[[1]]
 [1] "z" "r" "j" "h" "e" "m" "w" "u" "q" "f"

[[2]]
 [1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"

[[3]]
 [1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"

[[4]]
 [1] "c" "o" "t" "j" "d" "g" "u" "k" "w" "h"

[[5]]
 [1] "f" "g" "q" "y" "d" "e" "n" "s" "w" "i"

OTHER TIPS

datal <- list (a=c(2,2,1,2),
           b=c(2,2,2,4,3),
           c=c(1,2,3,4))

# all possible combinations
combs <- combn(length(datal), 2)
# split into list
combs <- split(combs, rep(1:ncol(combs), each = nrow(combs)))

# calculate length of intersection for every combination
intersections_length <- sapply(combs, function(y) {
  length(intersect(datal[[y[1]]],datal[[y[2]]]))
  }
  )

# What lists have biggest intersection
combs[which(intersections_length == max(intersections_length))]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top