Question

I am trying out Numba in speeding up a function that computes a minimum conditional probability of joint occurrence.

    import numpy as np
    from numba import double
    from numba.decorators import jit, autojit

    X = np.random.random((100,2))

    def cooccurance_probability(X):
        P = X.shape[1]      
        CS = np.sum(X, axis=0)                  #Column Sums
        D = np.empty((P, P), dtype=np.float)    #Return Matrix
        for i in range(P):
            for j in range(P):
                D[i, j] = (X[:,i] * X[:,j]).sum() / max(CS[i], CS[j])
        return D 

    cooccurance_probability_numba = autojit(cooccurance_probability)

However I am finding that the performance of cooccurance_probability and cooccurance_probability_numba to be pretty much the same.

%timeit cooccurance_probability(X)
1 loops, best of 3: 302 ms per loop

%timeit cooccurance_probability_numba(X)
1 loops, best of 3: 307 ms per loop

Why is this? Could it be due to the numpy element by element operation?

I am following as an example: http://nbviewer.ipython.org/github/ellisonbg/talk-sicm2-2013/blob/master/NumbaCython.ipynb

[Note: I could half the execution time due to the symmetric nature of the problem - but that isn't my main concern]

Was it helpful?

Solution

My guess would be that you're hitting the object layer instead of generating native code due to the calls to sum, which means that Numba isn't going to speed things up significantly. It just doesn't know how to optimize/translate sum (at this point). Additionally it's usually better to unroll vectorized operations into explicit loops with Numba. Notice that the ipynb that you link to only calls out to np.sqrt which I believe does get translated to machine code, and it operates on elements, not slices. I would try to expand out the sum in the inner loop as an explicit additional loop over elements, rather than taking slices and using the sum method.

My experience is that Numba can work wonders sometimes, but it doesn't speed-up arbitrary python code. You need to get a sense of the limitations and what it can optimize effectively. Also note that v0.11 is a bit different in this regard as compared to 0.12 and 0.13 due to the major refactoring that Numba went through between those versions.

OTHER TIPS

Below is a solution using Josh's advice, which is spot on. It appears however the max() works fine in the below implementation. It would be great if there was a list of "safe" python / numpy functions.

Note: I reduced the dimensionality of the original matrix to 100 x 200]

import numpy as np
from numba import double
from numba.decorators import jit, autojit

X = np.random.random((100,200))

def cooccurance_probability_explicit(X):
    C = X.shape[0]
    P = X.shape[1]      
    # - Column Sums - #
    CS = np.zeros((P,), dtype=np.float)
    for p in range(P):
        for c in range(C):
            CS[p] += X[c,p]
    D = np.empty((P, P), dtype=np.float)    #Return Matrix
    for i in range(P):
        for j in range(P):
            # - Compute Elemental Pairwise Sums over each Product Vector - #
            pws = 0
            for c in range(C):
                pws += (X[c,i] * X[c,j])
            D[i,j] = pws / max(CS[i], CS[j])
    return D 

cooccurance_probability_explicit_numba = autojit(cooccurance_probability_explicit)

%timeit results:

%timeit cooccurance_probability(X)
10 loops, best of 3: 83 ms per loop


%timeit cooccurance_probability_explicit(X)
1 loops, best of 3: 2.55s per loop

%timeit cooccurance_probability_explicit_numba(X)
100 loops, best of 3: 7.72 ms per loop

The interesting thing about the results is that the explicitly written version executed by python is very slow due to the large type checking overheads. But passing through Numba works it's magic. (Numba is ~11.5 times faster than the python solution using Numpy).


Update: Added a Cython Function for Comparison (thanks to moarningsun: Cython function with variable sized matrix input)

%load_ext cythonmagic
%%cython
import numpy as np
cimport numpy as np

def cooccurance_probability_cy(double[:,:] X):
    cdef int C, P, i, j, k
    C = X.shape[0]
    P = X.shape[1]
    cdef double pws
    cdef double [:] CS = np.sum(X, axis=0)
    cdef double [:,:] D = np.empty((P,P), dtype=np.float)

    for i in range(P):
        for j in range(P):
            pws = 0.0
            for c in range(C):
                pws += (X[c, i] * X[c, j])
            D[i,j] = pws / max(CS[i], CS[j]) 
    return D

%timeit results:

%timeit cooccurance_probability_cy(X)
100 loops, best of 3: 12 ms per loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top