Question

I'd like to find all node's neighbors at distance two for a not so small network (5000 nodes and 25k undirected edges). Right now I am using:

ArrayList<Node> twoDistNei = new ArrayList<Node>();
Collection<Node> myThreads = g.getNeighbors(u);
for(Node t:myThreads){
    Collection<Node> neighbors = g.getNeighbors(t);
    for(Node uu:neighbors){
       if(!twoDistNei.contains(uu)){
           twoDistNei.add(uu);
        }
    }
}

but it is really slow and I am wondering if there are more efficient and fast way to accomplish this.

EDIT: I managed to use KNeighborhoodFilter as mentioned, and this is what I came out with:

KNeighborhoodFilter filter = new KNeighborhoodFilter(u, 2,KNeighborhoodFilter.EdgeType.IN_OUT);
Graph<Node, Edge> transform = filter.transform(zpa);
Collection<Node> vertices = transform.getVertices();
Set<Node> twoDistThreads = new HashSet<Node>();
for (Node v : vertices) {
    if(v.getColor().equals("blue")){
       twoDistThreads.add((Thread)v);         
    }
    System.out.println("thread " + v.getName() + " has color " + v.getColor());
} 

Now I see that the filter only allows to transform() the original network and induce a subgraph with all the selected nodes (plus the nodes linked to the selected nodes... but why?). This imply that I have to filter the new collection of nodes to catch only the 2-dist vertices I am interested in - I have a bipartite graph where a set of nodes is "blue" and the other is "red". Am I doing things right here, @Joshua? Thank you for your help!

Best regards, Simone

Was it helpful?

OTHER TIPS

Your method of just getting all neighbors in a double loop is just fine. You are and should be relying on Jung to get all neighbors for you. If you are not satisfied with Jung's method of getting all neighbors you can either 1) improve Jung or 2) use something else, but I doubt that is actually the problem:

You are using a very slow method of making sure you don't store the same neighbor twice using an ArrayList:

  if(!twoDistNei.contains(uu)){
       twoDistNei.add(uu);
    }

What ArrayList.contains does is simply loop over all items to check if it has that a particular item, so the bigger your twoDistNei array grows the slower the contains method becomes. This is called a linear time algorithm, so in this case linear to the amount of items in the ArrayList. There are constant time algorithms that achieve this same effect, but always take the same time no matter how big your array is.

I would advise you to use a HashSet instead of an ArrayList, like this:

HashSet<Node> twoDistNei = new HashSet<Node>();
Collection<Node> myThreads = g.getNeighbors(u);
for(Node t:myThreads){
    twoDistNei.addAll( g.getNeighbors(t) );
}

A defining property of the HashSet is that it does not store duplicate entries, so you can simply use addAll and everything will be taken care of. Furthermore the HashSet builds an internal array using indices based on the hashCode, thus on average achieving constant time performance.

Hope this helps! :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top