Question

I have a weighted directed graph (it's sparse, 35,000 nodes and 19 million edges) and would like to calculate similarity scores for pairs of nodes. SimRank would be ideal for this purpose, except that it applies to unweighted graphs.

It's easy to adapt the matrix representation of SimRank to use weighted edges:

$\mathbf{S} = \max\{C \cdot (\mathbf{A}^T \cdot \mathbf{S} \cdot \mathbf{A}),\mathbf{I}\}$

Just replace the entries in the adjacency matrix $\mathbf{A}$ with the edge weights. But I don't see how to derive the "iteration to a fixed-point" method from this formula.

$s_{k+1}(a, b) = \frac{C}{\left|I(a)\right|\left|I(b)\right|} \cdot \sum_{i \in I(a)} \sum_{j \in I(b)} s_{k}(i, j)$

Is the following method correct?

$s_{k+1}(a, b) = C \cdot \sum_{i \in I(a)} \sum_{j \in I(b)} s_{k}(i, j) \cdot w(i, a) \cdot w(j, b)$

My reasoning is that the original method is equivalent to this:

$s_{k+1}(a, b) = C \cdot \sum_{i \in I(a)} \sum_{j \in I(b)} s_{k}(i, j) \cdot \frac{1}{\left|I(a)\right|} \cdot \frac{1}{\left|I(b)\right|}$

so I replaced the trivially equal weights with the actual ones I have. Confirmation would be nice, though.

If there is a superior method to SimRank, I would like to hear about it. (I'm concerned with improvements to accuracy, not performance; this is a small data set.)

Edit: A different similarity metric that's nearly as accurate but performs a lot better would actually be useful too. My initial tests of SimRank are very slow, and the potential optimizations are not simple to code.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top