Question

I am learning about the PageRanking algorithm so sorry for some newbie questions. I understand that the PR value is calculated for each page by the summation of incoming links to itself.

Now I am bothered by a statement which stated that "the PageRank values sum to one " at wikipedia.

As the example shown at wikipedia, if every page has a outbound link, then the summation of whole probabilities from each page should be one. However, if a page does not have any outbound link such as page A at the example, then the summation should not be value 1 right ?

Thus, does Pagerank algorithm have to assume that every page has at least one outbound link ? Could someone elaborate more how Pageranking deal with pages without any incoming or outbound links ? How will the formulas change accordingly ? Thanks

Was it helpful?

Solution

As page-rank is described in the original article, and in the wikipedia article, it is indeed not defined when out-degree(v)=0 for some v, since you get P(v,u)=d/n+(1-d)*0/0 - which is undefined

A node that has no outgoing edge is called a dangling node and there are basically 3 common ways to take care of them:

  1. Eliminate such nodes from the graph (and repeat the process iteratively until there are no dangling nodes.
  2. Consider those pages to link back to the pages that linked to them (i.e. - for each edge (u,v), if out-degree(v) = 0, regard (v,u) as an edge).
  3. Link the dangling node to all pages (including itself usually), and effectively make the probability for random jump from this node 1.

About a page with no incoming node - that shouldn't be an issue because everything is perfectly defined. Such a node will have a page rank of exactly d/n - because you can only get to it by random surfing from any node - and that's the probability to be in it.

Hope that answered your question!

OTHER TIPS

The PageRank algorithm ranks a page based on the incoming links to that page. The outbound links from that page help determine the PageRank of the other pages to which it links. This process is iterated repeatedly to determine PageRank.

In each iteration, value is added to page A's PageRank if there are incoming links from other pages. The value added to page A is the PageRank of page B, which contains the incoming link to page A, divided by the total number of outgoing links on page B.

Therefore, having no outbound links will not affect the PageRank of page A. The impact of having no outbound links is only that page A will not add value to the PageRank of any other pages. By contrast, if there are no incoming links to page B, it will have the baseline (very low) PageRank, because it never gets added value from incoming links.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top