سؤال

I have a list that looks like this one:

$`264`
[1] "CHAMP1" "MAP1S"  "PRRC1"  "TUT1"   "CDK12" 

$`265`
[1] "TUT1"   "PRRC1"  "CHAMP1" "MAP1S"

$`266`
[1] "REPS1"  "CHAMP1" "PRRC1"  "TUT1"   "MAP1S" 

$`267`
[1] "G3BP1"  "TUT1"   "PRRC1"  "CHAMP1" "MAP1S" 

$`268`
[1] "TUT1"   "CHAMP1" "PRRC1"  "MAP1S"  

$`269`
[1] "DDB1"   "CHAMP1" "TUT1"   "PRRC1"  "MAP1S"

Is there any package or function to calculate the similarity among the different list components?

Many thanks

هل كانت مفيدة؟

المحلول

I'm not aware of any packages, but this implements your own metric (from your comment):

siml  <- function(x,y) {
  length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z      <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
#       [,1] [,2]  [,3]  [,4] [,5]  [,6]
# [1,] 1.000  0.8 0.667 0.667  0.8 0.667
# [2,] 0.800  1.0 0.800 0.800  1.0 0.800
# [3,] 0.667  0.8 1.000 0.667  0.8 0.667
# [4,] 0.667  0.8 0.667 1.000  0.8 0.667
# [5,] 0.800  1.0 0.800 0.800  1.0 0.800
# [6,] 0.667  0.8 0.667 0.667  0.8 1.000

This is a (slightly) more efficient way to do the same thing:

result <- sapply(lst,function(x) 
            sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top