Count how many vertices in a vertex's neighbourhood have an attribute in igraph for R

StackOverflow https://stackoverflow.com/questions/23155365

  •  05-07-2023
  •  | 
  •  

I have a large graph (several, actually) in igraph—on the order of 100,000 vertices—and each vertex has an attribute which is either true or false. For each vertex, I would like to count how many of the vertices directly connected to it have the attribute. My current solution is the following function, which takes as its argument a graph.

attrcount <- function(g) {
  nb <- neighborhood(g,order=1)
  return(sapply(nb,function(x) {sum(V(g)$attr[x]}))
}

This returns a vector of counts which is off by 1 for vertices which have the attribute, but I can adjust this easily.

The problem is that this runs incredibly slowly, and it seems like there should be a fast way to do this, since, for instance, computing the degree of each vertex is practically instantaneous with degree(g).

Am I doing this a stupid way?

As an example, suppose this was our graph.

set.seed(42)
g <- erdos.renyi.game(169081, 178058, type="gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))
有帮助吗?

解决方案

Use get.adjlist to query all adjacent vertices, and then sapply (or tapply might be even faster) on this list to get the counts. It is also worth storing the attribute in a vector, because then you don't need to extract it all the time.

With sapply

system.time({
  al <- get.adjlist(g)
  att <- V(g)$att
  res <- sapply(al, function(x) sum(att[x]))
})
#   user  system elapsed 
#  0.571   0.005   0.576 

With tapply

system.time({
  al <- get.adjlist(g)
  alv <- unlist(al)
  alf <- factor(rep(seq_along(al), sapply(al, length)),
                levels=seq_along(al))
  att <- V(g)$att
  res2 <- tapply(att[alv], alf, sum)
  res2[is.na(res2)] <- 0
})
#   user  system elapsed 
#  1.121   0.020   1.144 

all(res == res2)
# TRUE

Somewhat a surprise to me, but the tapply solution is actually slower.

If this is still not enough, then I guess you can still make it faster by writing it in C/C++.

其他提示

For faster computation, use get.adjacency to pull the adjacency matrix, then multiply the matrix by the attribute vector using %*%:

library(igraph)
set.seed(42)
g <- erdos.renyi.game(1000, 1000, type = "gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))

system.time({
  ma   <- get.adjacency(g)
  att  <- V(g)$att
  res1 <- as.numeric(ma %*% att)
})
#  user  system elapsed 
# 0.003   0.000   0.003 

Compared to using get.adjlist and sapply:

system.time({
  al   <- get.adjlist(g)
  att  <- V(g)$att
  res2 <- sapply(al, function(x) sum(att[x]))
})
#   user  system elapsed 
#  9.733   0.243  10.107

After modifying the class of res1, the results vector is identical:

res1 <- as.numeric(res1)
identical(res1, res2)
# [1] TRUE
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top