similarity index in a list of character vectors

https://stackoverflow.com/questions/23188012

r
similarity

06-07-2023
|

문제

I have a list that looks like this one:

$`264`
[1] "CHAMP1" "MAP1S"  "PRRC1"  "TUT1"   "CDK12" 

$`265`
[1] "TUT1"   "PRRC1"  "CHAMP1" "MAP1S"

$`266`
[1] "REPS1"  "CHAMP1" "PRRC1"  "TUT1"   "MAP1S" 

$`267`
[1] "G3BP1"  "TUT1"   "PRRC1"  "CHAMP1" "MAP1S" 

$`268`
[1] "TUT1"   "CHAMP1" "PRRC1"  "MAP1S"  

$`269`
[1] "DDB1"   "CHAMP1" "TUT1"   "PRRC1"  "MAP1S"

Is there any package or function to calculate the similarity among the different list components?

Many thanks

해결책

I'm not aware of any packages, but this implements your own metric (from your comment):

siml  <- function(x,y) {
  length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z      <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
#       [,1] [,2]  [,3]  [,4] [,5]  [,6]
# [1,] 1.000  0.8 0.667 0.667  0.8 0.667
# [2,] 0.800  1.0 0.800 0.800  1.0 0.800
# [3,] 0.667  0.8 1.000 0.667  0.8 0.667
# [4,] 0.667  0.8 0.667 1.000  0.8 0.667
# [5,] 0.800  1.0 0.800 0.800  1.0 0.800
# [6,] 0.667  0.8 0.667 0.667  0.8 1.000

This is a (slightly) more efficient way to do the same thing:

result <- sapply(lst,function(x) 
            sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow