Question

I am running PageRank on a weighted DiGraph where nodes = 61634, edges = 28,378.

  • pagerank(G) throws me ZeroDivsionError

  • pagerank_numpy(G) throws me ValueError : array to big

  • pagerank_scipy(G) gives me the page ranks though

I can understand that pagerank_numpy error would be due to memory limitations but why does pagerank fail? I tried adding an infinitesimal values to edges with zero weights but the same issues persist. Some pointers would be nice.

Link to my GraphML file - https://mega.co.nz/#!xlYzEDAI!Lyh5pD-NJL61JPfkrNyJrEm0NnFc586A0MUD8OMYAO0

NetworkX version - 1.8.1 Python - 2.7

Was it helpful?

Solution 2

pagerank fails because it performs its computation using stochastic_graph -- unlike pagerank_numpy or pagerank_scipy. From the docs, stochastic_graph requires:

A NetworkX graph, must have valid edge weights

This "valid edge weights" point (which is not explained at all, which I think is a mistake) is the source of your problem.

For a directed graph, stochastic_graph uses the out_degree of each node to normalize the edges. Again from the docs:

The [out] degree is the sum of the edge weights adjacent to the node.

So when you have edges with zero weight or negative weight, the normalization process breaks with a ZeroDivisionError. The reason that negative weights are an issue is that they can cancel out positive weights, and thus give a node degree zero. For example, in your graph, node '2123271' has two edges who's weights sum to 0:

>>> G.edges('2123271', data=True)
[('2123271', '1712899', {'weight': -1L}),
 ('2123271', '890839', {'weight': 1L})]

Replacing negative or zero edge weights in your graph with a small, positive edge weight made it so pagerank could run:

In [1]: import networkx as nx
In [2]: G = nx.read_graphml("your_graph.graphml")
In [3]: defaultEdgeWeight = 0.01
In [4]: for u, v, d in G.edges(data=True):
            if d['weight'] <= 0:
                G[u][v]['weight'] = defaultEdgeWeight
In [5]: P = nx.pagerank( G )

Of course, pagerank didn't converge after 102 iterations, but that's another issue.

OTHER TIPS

@mtitan8's answer is good but there is a little more to the story.

Since the time of your original question the NetworkX code was modified so that pagerank(), pagerank_numpy(), and pagerank_scipy() all give the same answer when there are zero or negative weights (https://github.com/networkx/networkx/pull/1001)

The results produced by those functions when you have negative weights are probably not what you want (if it works at all). The way the algorithms now handle creating the 'Google Matrix' out of the input matrix (the weighted adjacency matrix of the graph) is that the rows are divided by the row sum unless it is zero (then the entire row is set to zero). That sum might be negative.

If the resulting matrix still ends up with negative entries then the Perron-Frobenius theorem doesn't apply http://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem and you are not guaranteed to have a unique largest eigenvalue with positive valued eigenvector.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top