Calculating pageranks for a sparse directed graph with high percentage of deadlinks

https://stackoverflow.com/questions/3669226

01-10-2019
|

Question

I am a graduate student in computer science at Indiana University, Bloomington. For one of my research projects, i am working on calculating pageranks for a directed graph which is very sparse and has a high percentage of deadlinks.

By deadlinks I mean nodes that have outdegree zero. Sometimes, in a graph with a lot of deadlinks, spider traps may occur. Anyways, the problem I am interested in is finding out pageranks in this scenario.

And I am using JUNG (Java Universal Graph Network) for calculating the pageranks.

When I use the normal procedure,

Graph<String, String> jungGraph = new DirectedSparseGraph<String, String>();
PageRank<String, String> pagerank = new PageRank<String,String>(jungGraph, 0.2);
pagerank.setMaxIterations(20);
pagerank.setTolerance(0.000001);
pagerank.evaluate();

I get more or less the same pagerank values for all the nodes, when i clearly know that shouldn't be the case. As some nodes in the graph have a large number of outgoing nodes and are strongly interconnected.

What is the suggested approach in this case. I know there is this class PageRankWithPriors. Should I first extract the network with no deadlinks, calculate pageranks for them, and then propagate their rank to the deadlinks until they converge.? In the latter case, all the nodes in the reduced network (outdegree != 0 ) will have their priors set, whereas the deadlinks wont.

Am I missing anything here?

Solution

I don't think PageRankWithPriors is what you want.

Which version of PageRank are you using? The class edu.uci.ics.jung.algorithms.importance.PageRank or edu.uci.ics.jung.algorithms.scoring.PageRank? The former has been deprecated in favor of the latter in Jung 2.0 Beta.

They seem to treat outdegree 0 nodes differently, which may be your issue. The former's spec says:

probability of going from node u to node v is equal to (1-alpha)[1/outdegree(u)] + alpha(1/|V|)

If u has no out-edges in the original graph then 0 is used instead of 1/outdegree(v).

That seems wrong, as it leads to a loss of probability (the total probability of leaving u by some method should equal 1, and it doesn't). The latter does it differently:

If a vertex has no outgoing edges, then the probability of taking a random jump from that vertex is (by default) effectively 1

That should conserve probability which is what you want.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow