Вопрос

I'm trying to compile an edge list to be used for a social network graph based on a table that looks something like this:

CompanyID  ProjectID  Year
   A         1        2010
   B         3        2011
   C         1        2010
   D         5        2012
   E         1        2010

The idea is to have a list of vertices (companies) that have worked on the same project. So, given the data above, I'd have

CompanyA    CompanyB
   A           C
   A           E
   C           E

Any help would be appreciated. Thank you ahead of time!

Это было полезно?

Решение

Call your data frame x:

x <- read.table(header=TRUE, text='CompanyID  ProjectID  Year
A         1        2010
B         3        2011
C         1        2010
D         5        2012
E         1        2010')

Choose those entries that have multiple values for ProjectID:

(mx <- x[ave(seq(nrow(x)), x$ProjectID, FUN=length) > 1,])
##   CompanyID ProjectID Year
## 1         A         1 2010
## 3         C         1 2010
## 5         E         1 2010

Now for the magic:

do.call(rbind, 
        by(mx, mx$ProjectID,
           FUN=function(mx) 
             t(apply(combn(as.numeric(mx$CompanyID), 2), 2, 
                function(x) levels(mx$CompanyID)[x]
               )
             )
        )
)
##      [,1] [,2]
## [1,] "A"  "C" 
## [2,] "A"  "E" 
## [3,] "C"  "E" 

With your example data, you get the same result without wrapping up in do.call(rbind ... but that is needed in the case where there are multiple ProjectID's in play.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top