Looping through rows and recording matching pairs

https://stackoverflow.com/questions/21083701

27-09-2022
|

문제

I'm trying to compile an edge list to be used for a social network graph based on a table that looks something like this:

CompanyID  ProjectID  Year
   A         1        2010
   B         3        2011
   C         1        2010
   D         5        2012
   E         1        2010

The idea is to have a list of vertices (companies) that have worked on the same project. So, given the data above, I'd have

CompanyA    CompanyB
   A           C
   A           E
   C           E

Any help would be appreciated. Thank you ahead of time!

해결책

Call your data frame x:

x <- read.table(header=TRUE, text='CompanyID  ProjectID  Year
A         1        2010
B         3        2011
C         1        2010
D         5        2012
E         1        2010')

Choose those entries that have multiple values for ProjectID:

(mx <- x[ave(seq(nrow(x)), x$ProjectID, FUN=length) > 1,])
##   CompanyID ProjectID Year
## 1         A         1 2010
## 3         C         1 2010
## 5         E         1 2010

Now for the magic:

do.call(rbind, 
        by(mx, mx$ProjectID,
           FUN=function(mx) 
             t(apply(combn(as.numeric(mx$CompanyID), 2), 2, 
                function(x) levels(mx$CompanyID)[x]
               )
             )
        )
)
##      [,1] [,2]
## [1,] "A"  "C" 
## [2,] "A"  "E" 
## [3,] "C"  "E"

With your example data, you get the same result without wrapping up in do.call(rbind ... but that is needed in the case where there are multiple ProjectID's in play.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow