Question

I'm testing some similarity metrics in a graph. I'm using JUNG API to handle the graphs. I've managed to compute basic metrics like common neighbors and prefferential attachment. Now, I want to calculate katz metric that is as follow: katz(v1,v2) = B.paths_1(v1,v2) + B^2.paths_2(v1,v2) + ...+ B^n.paths_n(v1,v2) where paths_n(v1,v2) is the number of paths of length "n" between v1 and v2; and B is a scalar. I'm restricting n to 4, so the final katz matrix could be calculated easily through: B.A + (B.A)^2 + ... + (B.A)^4 where A is the adjacency matrix of the graph. The thing is that the graphs I'm working with are very huge and I'm not able to store the whole katz matrix in memory. Besides, I won't need all the scores because I only test few pairs of nodes. I can't find an efficient way to calculate individual scores without having to performe a walk on the graph though. Any ideas?

Was it helpful?

Solution

To calculate individual score ketz(v1,v2), you just need to consider the adjacency sub-matrix, which contains only the vertices which are within a distance less than 4 from either v1 or v2.

You can locate those vertices using breath-first-search from v1 and from v2.

But you can actually do much better if you directly count the #paths while doing a BFS starting at v1. You only need to remember the distance from v1 and at each vertex check whether you have arrived at v2. If so increment the appropriate counter.

Something like that (pseudo code):

Queue q = new Queue();
q.enqueue((v1, 0));
int[] counts = new int[] { 0,0,0,0,0 };

while (!q.empty()) {
    (v, dist) = q.dequeue();
    for(Vertex w : v.Neighbors()) {
        if(dist < 3)
            q.enqueue((w, dist+1));

        if(dist < 4 && w == v2)
            counts[dist+1]++;
    }
}

So after you run this, you'll have counts[n] = paths_n(v1,v2) for n =1,2,3,4

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top