The problem is that you are indexing the 2-D array like c[j][k]
when actually you should do c[j,k]
, otherwise Cython is using an intermediate buffer for buf=c[j]
, from which it will take buf[k]
, causing the slow-down. You should use this proper indexing plus the cdef
declarations especified by @XavierCombelle.
You can check that this intermediate buffer is causing the slow-down by doing:
np.ndarray[DTYPE_t, ndim=1] buf
and then, inside the loop:
buf = c[j]
buf[k] = val_a + val_b
this declared buffer should give the same speed (or close) than:
c[j,k] = val_a + val_b