I have a list that looks like this one:

$`264`
[1] "CHAMP1" "MAP1S"  "PRRC1"  "TUT1"   "CDK12" 

$`265`
[1] "TUT1"   "PRRC1"  "CHAMP1" "MAP1S"

$`266`
[1] "REPS1"  "CHAMP1" "PRRC1"  "TUT1"   "MAP1S" 

$`267`
[1] "G3BP1"  "TUT1"   "PRRC1"  "CHAMP1" "MAP1S" 

$`268`
[1] "TUT1"   "CHAMP1" "PRRC1"  "MAP1S"  

$`269`
[1] "DDB1"   "CHAMP1" "TUT1"   "PRRC1"  "MAP1S"

Is there any package or function to calculate the similarity among the different list components?

Many thanks

有帮助吗?

解决方案

I'm not aware of any packages, but this implements your own metric (from your comment):

siml  <- function(x,y) {
  length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z      <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
#       [,1] [,2]  [,3]  [,4] [,5]  [,6]
# [1,] 1.000  0.8 0.667 0.667  0.8 0.667
# [2,] 0.800  1.0 0.800 0.800  1.0 0.800
# [3,] 0.667  0.8 1.000 0.667  0.8 0.667
# [4,] 0.667  0.8 0.667 1.000  0.8 0.667
# [5,] 0.800  1.0 0.800 0.800  1.0 0.800
# [6,] 0.667  0.8 0.667 0.667  0.8 1.000

This is a (slightly) more efficient way to do the same thing:

result <- sapply(lst,function(x) 
            sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top