Question

I am trying to find the internal page rank of Wikipedia using Mapreduce. I implemented my Pagerank algorithm on a small subset of wikipages. There are 6349 pages. I used this formula to calculate the pagerank (d = 0.85).

enter image description here

I wanted to verify if the sum of all the pagerank is equal to the total number of pages(6349).

What I found so far:

1.The total page rank of all the 6349 pages is 1001.26044

2.According to WikiPedia if I use the above formula then each PageRank is multiplied by N and the sum becomes N. I multiplied each page rank by N (6349) and calculated the sum, I got 6356789.5.

Is there a reason why the sum of page ranks is not equal to the total number of pages? Should I use the second formula to verify ?

enter image description here

Note: I ran my mapreduce code for 10 iterations to get a good approximation.

Était-ce utile?

La solution

As I suppose, you have too few iterations. Why 10? Why 100? Or 100000? You should count, what are the mediums or maximums of the two last changes. And thus evaluate the possible error.

And the PR is a probability. The sum of all of them should be 1! The sentence "sum of all the pagerank is equal to the total number of pages" is wrong.

As for another formula, it belongs to another model and another PR. Of course, you can use it too. Or both. But you can't check using it.

Autres conseils

it depends what base you choose (default is 1). After each iteration you have to calculate

delta = (base - sum_of_ranks) / N

And then decrease each rank by delta. Only in this way you will keep you ranks alive until the end last iteration.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top