Adding arrays with cython slower than numpy?

Question 1

The problem is that you are indexing the 2-D array like c[j][k] when actually you should do c[j,k], otherwise Cython is using an intermediate buffer for buf=c[j], from which it will take buf[k], causing the slow-down. You should use this proper indexing plus the cdef declarations especified by @XavierCombelle.

You can check that this intermediate buffer is causing the slow-down by doing:

np.ndarray[DTYPE_t, ndim=1] buf

and then, inside the loop:

buf = c[j]
buf[k] = val_a + val_b

this declared buffer should give the same speed (or close) than:

c[j,k] = val_a + val_b

Question 2

I think you are missing

cdef int j
cdef int k

so your variable loop are python object not c ones

Question 3

Here are two examples:

The "numpy way"

%%timeit
table1 = np.ones((10,10))
table2 = np.ones((10,10))
result = np.zeros((10,10))
table1 + table2 

100000 loops, best of 3: 14.5 µs per loop

The looping over indices way

%%timeit
def add_arrays(ar1, ar2):
    for j in range(len(ar1)):
        for k in range(len(ar2)):
            val_a = ar1[j][k]
            val_b = ar2[j][k]
            result[j][k] = val_a + val_b    
    return result

add_arrays(table1, table2)

1000 loops, best of 3: 307 µs per loop

Same thing, 20 times faster.

With all this, I am aware that I have not completely answered your question, but maybe it gives you a better perspective for your comparisons?

[edit] for 1000x1000 tables, the time difference is more pronounced; I supect is is due to the amortization of the overhead of building the tables.

former code: 100 loops, best of 3: 13.1 ms per loop
latter code: 1 loops, best of 3: 2.78 s per loop

Which is a 200 factor