Question

Pagerank works on the nodegraph of a series of pages and the directed edges formed by their respective inward and outward links. Thus the rank of a particular page is broadly a locally-induced effect in the nodegraph.

SVD, on the other hand, works on a whole matrix of values, and has no directionality - a link between site A and site B would only register as a 1 on the correct matrix element. It is a global system, so ranking is a global effect.

Given the extreme sparseness of web-derived matrices, I would expect SVD to be a bad performer here, as it requires a complete dataset, and has significant memory requirements.

Is that true? Does Pagerank outdo SVD largely because it is a nodegraph-based algorithm? How can Pagerank infer semantic relevance from a page beyond the number of times a word is mentioned? Or would that be a second step, performed after Pagerank has ranked the pages?

Was it helpful?

Solution

There are two issues here: which measure is easy to compute, and which yields the information that we're looking for? I don't know the answer to either question, but I can perhaps give a partial answer.

Firstly, the relevance. Both quantities are centrality measures, to use a term from network theory. PageRank computes (a variant of) eigenvector centrality, while the SVD apparently leads to the Hyperlink-Induced Topics Search (HITS) algorithm. I got this from this handout from Peter Dodds (University of Vermont). They measure different things, but it's not clear to me which one is the most relevant for measuring the importance of a webpage.

Secondly, the computational cost. Mathematically speaking, the PageRank is the dominant eigenvector of the (modified) adjacency matrix - as explained on the Wikipedia page - while HITS gives the dominant singular vector of the adjacency matrix. Both are defined by the global network of webpages and links between them, and both can be computed by only considering the node graph locally. So at first sight, I think that the computational cost is roughly equal.

In conclusion, I don't know why PageRank is better than SVD; it's not even clear to me that it is better than SVD.

OTHER TIPS

Note that PageRank uses a teleported random walk matrix. Teleportation is important to avoid (low-degree) localized eigenvectors of the random walk matrix. I think PageRank is better than HITS, because the random walk matrix, which is a degree-normalized adjacency matrix, depresses the effects of large degree nodes and loops, as opposed to HITS where large-degree nodes could make localized vectors.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top