Question

I am attempting to understand the concepts behind Google PageRank, and am attempting to implement a similar (though rudimentary) version in Python. I have spent the last few hours familiarizing myself with the algorithm, however it's still not all that clear.

I've located a particularly interesting website that outlines the implementation of PageRank in Python. However, I can't quite seem to understand the purpose of all of the functions shown on this page. Could anyone clarify what exactly the functions are doing, particularly pageRankeGenerator?

Was it helpful?

Solution

I'll try to give a simple explanation (definition) of the PageRank algorithm from my personal notes.

Let us say that pages T1, T2, ... Tn are pointing to page A, then

PR(A) = (1-d) + d * (PR(T1) / C(T1) + ... + PR(Tn) / C(Tn))

where

  • PR(Ti) is the PageRank of Ti
  • C(Ti) is the number of outgoing links from page Ti
  • d is the dumping factor in the range 0 < d < 1, usually set to 0.85

Every PR(x) can have start value 1 and we adjust the page ranks by repeating the algorithm ~10-20 times for each page.

Example for pages A, B, C:

   A <--> B
   ^     /
    \   v
      C

Round 1
A = 0.15 + 0.85 (1/2 + 1/1) = 1.425
B = 0.15 + 0.85 (1/1) = 1
C = 0.15 + 0.85 (1/2) = 0.575

round's sum = 3

Round 2
A = 0.15 + 0.85 (1/2 + 0.575) = 1.06375
B = 0.15 + 0.85 (1.425) = 1.36125
C = 0.15 + 0.85 (1/2) = 0.575

round's sum = 3

Round 3
A = 0.15 + 0.85 (1.36125/2 + 0.575) = 1.217
B = 0.15 + 0.85 (1.06375) = 1.054
C = 0.728

round's sum = 3

...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top