Looping through rows and recording matching pairs

https://stackoverflow.com/questions/21083701

27-09-2022
|

Question

I'm trying to compile an edge list to be used for a social network graph based on a table that looks something like this:

CompanyID  ProjectID  Year
   A         1        2010
   B         3        2011
   C         1        2010
   D         5        2012
   E         1        2010

The idea is to have a list of vertices (companies) that have worked on the same project. So, given the data above, I'd have

CompanyA    CompanyB
   A           C
   A           E
   C           E

Any help would be appreciated. Thank you ahead of time!

Solution

Call your data frame x:

x <- read.table(header=TRUE, text='CompanyID  ProjectID  Year
A         1        2010
B         3        2011
C         1        2010
D         5        2012
E         1        2010')

Choose those entries that have multiple values for ProjectID:

(mx <- x[ave(seq(nrow(x)), x$ProjectID, FUN=length) > 1,])
##   CompanyID ProjectID Year
## 1         A         1 2010
## 3         C         1 2010
## 5         E         1 2010

Now for the magic:

do.call(rbind, 
        by(mx, mx$ProjectID,
           FUN=function(mx) 
             t(apply(combn(as.numeric(mx$CompanyID), 2), 2, 
                function(x) levels(mx$CompanyID)[x]
               )
             )
        )
)
##      [,1] [,2]
## [1,] "A"  "C" 
## [2,] "A"  "E" 
## [3,] "C"  "E"

With your example data, you get the same result without wrapping up in do.call(rbind ... but that is needed in the case where there are multiple ProjectID's in play.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow