Question

What is the fastest way in numpy to calculate the number of jumps that dijkstra's algorithm uses? I have a 10000x10000 element connectivity matrix and use scipy.sparse.csgraph.dijkstra to calculate the filled distance matrix and a predecessor matrix. My naive solution is as follows:

import numpy as np
from numpy.random import rand
from scipy.sparse.csgraph import dijkstra

def dijkway(dijkpredmat, i, j):
    """calculate the path between two nodes in a dijkstra matrix"""
    wayarr = []
    while (i != j) & (j >= 0):
        wayarr.append(j)
        j = dijkpredmat[i,j]
    return np.array(wayarr)

def jumpvec(pmat,node):
    """calculate number of jumps from one node to all others"""
    jumps = np.zeros(len(pmat))
    jumps[node] = -999
    while 1:
        try:
            rvec = np.nonzero(jumps==0)[0]
            r = rvec.min()
            dway = dijkway(pmat, node, r)
            jumps[dway] = np.arange(len(dway),0,-1)
        except ValueError:
            break
    return jumps

#Create a matrix
mat = (rand(500,500)*20)
mat[(rand(50000)*500).astype(int), (rand(50000)*500).astype(int)] = np.nan
dmat,pmat = dijkstra(mat,return_predecessors=True)

timeit jumpvec(pmat,300)
In [25]: 10 loops, best of 3: 51.5 ms per loop

~50msek/node is OK but expanding the distance matrix to 10000 nodes increases the time to ~2sek/node. jumpvec has to be executed 10000 times then as well...

Was it helpful?

Solution 2

Following code can speedup 4x on my PC, it's faster because:

  • use ndarray.item() to get values from array.
  • use set object to save unprocessed index.
  • don't create numpy.arange() in the while loop.

Python code:

def dijkway2(dijkpredmat, i, j):
    wayarr = []
    while (i != j) & (j >= 0):
        wayarr.append(j)
        j = dijkpredmat.item(i,j)
    return wayarr

def jumpvec2(pmat,node):
    jumps = np.zeros(len(pmat))
    jumps[node] = -999
    todo = set()
    for i in range(len(pmat)):
        if i != node:
            todo.add(i)

    indexs = np.arange(len(pmat), 0, -1)
    while todo:
        r = todo.pop()
        dway = dijkway2(pmat, node, r)
        jumps[dway] = indexs[-len(dway):]
        todo -= set(dway)
    return jumps

To speedup even more, you can use cython:

import numpy as np
cimport numpy as np
import cython

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef dijkway3(int[:, ::1] m, int i, int j):
    cdef list wayarr = []
    while (i != j) & (j >= 0):
        wayarr.append(j)
        j = m[i,j]
    return wayarr

@cython.wraparound(False)
@cython.boundscheck(False)
def jumpvec3(int[:, ::1] pmat, int node):
    cdef np.ndarray jumps
    cdef int[::1] jumps_buf
    cdef int i, j, r, n
    cdef list dway
    jumps = np.zeros(len(pmat), int)
    jumps_buf = jumps
    jumps[node] = -999

    for i in range(len(jumps)):
        if jumps_buf[i] != 0:
            continue
        r = i
        dway = dijkway3(pmat, node, r)
        n = len(dway)
        for j in range(n):
            jumps_buf[<int>dway[j]] = n - j
    return jumps

Here is my test, the cython version is 80x faster:

%timeit jumpvec3(pmat,1)
%timeit jumpvec2(pmat, 1)
%timeit jumpvec(pmat, 1)

output:

1000 loops, best of 3: 138 µs per loop
100 loops, best of 3: 2.81 ms per loop
100 loops, best of 3: 10.8 ms per loop

OTHER TIPS

Here's an (asymptotically optimal) O(n) algorithm.

Create a set of unvisited vertices, initially all but the source vertex. Initialize to 0 the jump vector entry for the source to itself. While the set is not empty, pop an element v. Using the predecessor matrix, collect v and each successive ancestor in a list until you reach one already visited. Iterate through the list in reverse order, setting the jump vector entry of each node w to the entry for its parent plus 1, then removing w from the set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top