Why doesn't Numba improve this iteration ...?

Question 1

My guess would be that you're hitting the object layer instead of generating native code due to the calls to sum, which means that Numba isn't going to speed things up significantly. It just doesn't know how to optimize/translate sum (at this point). Additionally it's usually better to unroll vectorized operations into explicit loops with Numba. Notice that the ipynb that you link to only calls out to np.sqrt which I believe does get translated to machine code, and it operates on elements, not slices. I would try to expand out the sum in the inner loop as an explicit additional loop over elements, rather than taking slices and using the sum method.

My experience is that Numba can work wonders sometimes, but it doesn't speed-up arbitrary python code. You need to get a sense of the limitations and what it can optimize effectively. Also note that v0.11 is a bit different in this regard as compared to 0.12 and 0.13 due to the major refactoring that Numba went through between those versions.

Question 2

Below is a solution using Josh's advice, which is spot on. It appears however the max() works fine in the below implementation. It would be great if there was a list of "safe" python / numpy functions.

Note: I reduced the dimensionality of the original matrix to 100 x 200]

import numpy as np
from numba import double
from numba.decorators import jit, autojit

X = np.random.random((100,200))

def cooccurance_probability_explicit(X):
    C = X.shape[0]
    P = X.shape[1]      
    # - Column Sums - #
    CS = np.zeros((P,), dtype=np.float)
    for p in range(P):
        for c in range(C):
            CS[p] += X[c,p]
    D = np.empty((P, P), dtype=np.float)    #Return Matrix
    for i in range(P):
        for j in range(P):
            # - Compute Elemental Pairwise Sums over each Product Vector - #
            pws = 0
            for c in range(C):
                pws += (X[c,i] * X[c,j])
            D[i,j] = pws / max(CS[i], CS[j])
    return D 

cooccurance_probability_explicit_numba = autojit(cooccurance_probability_explicit)

%timeit results:

%timeit cooccurance_probability(X)
10 loops, best of 3: 83 ms per loop


%timeit cooccurance_probability_explicit(X)
1 loops, best of 3: 2.55s per loop

%timeit cooccurance_probability_explicit_numba(X)
100 loops, best of 3: 7.72 ms per loop

The interesting thing about the results is that the explicitly written version executed by python is very slow due to the large type checking overheads. But passing through Numba works it's magic. (Numba is ~11.5 times faster than the python solution using Numpy).

Update: Added a Cython Function for Comparison (thanks to moarningsun: Cython function with variable sized matrix input)

%load_ext cythonmagic
%%cython
import numpy as np
cimport numpy as np

def cooccurance_probability_cy(double[:,:] X):
    cdef int C, P, i, j, k
    C = X.shape[0]
    P = X.shape[1]
    cdef double pws
    cdef double [:] CS = np.sum(X, axis=0)
    cdef double [:,:] D = np.empty((P,P), dtype=np.float)

    for i in range(P):
        for j in range(P):
            pws = 0.0
            for c in range(C):
                pws += (X[c, i] * X[c, j])
            D[i,j] = pws / max(CS[i], CS[j]) 
    return D

%timeit results:

%timeit cooccurance_probability_cy(X)
100 loops, best of 3: 12 ms per loop