I am confused how pagerank algorithm work with mapreduce model.
The main confusion is that after phaseII, the val is inlinks to the key URL(not the outlinks), so how can it work in the next iteration?
See my example below:
txt:
A->B
A->C
B->A
C->B
WORKER1 WORKER2
LOAD
A->B B->A
A->C C->B
MAP
(A,B) (B,A)
(A,C) (C,B)
SHUFFLE AND DISTRIBUTE
(A,[B,C]) (B,[A])
(C,[B])
REDUCE
(A,(PR(A),[B,C],2)) (B,(PR(B),[A],1))
(C,(PR(C),[B],1))
MAP(PHASE2)
(B,(PR(A)/2,2)) (A,(PR(B)/1,1))
(C,(PR(A)/2,2)) (B,(PR(C)/1,1))
SHUFFLED AND DISTRIBUTE
(A,[PR(B)/1]) (B,[PR(A)/2,PR(C)/1])
(C,[PR(A)/2])
RERUCE
(A,(NEWPR(A),[B],2)) (B,(NEWPR(B),[A,C],1))
(C,(NEWPR(C),[A],1))
Till now, I lose the outlinks info, where is my mistake?