Question

I have two numpy boolean arrays (a and b). I need to find how many of their elements are equal. Currently, I do len(a) - (a ^ b).sum(), but the xor operation creates an entirely new numpy array, as I understand. How do I efficiently implement this desired behavior without creating the unnecessary temporary array?

I've tried using numexpr, but I can't quite get it to work right. It doesn't support the notion that True is 1 and False is 0, so I have to use ne.evaluate("sum(where(a==b, 1, 0))"), which takes about twice as long.

Edit: I forgot to mention that one of these arrays is actually a view into another array of a different size, and both arrays should be considered immutable. Both arrays are 2-dimensional and tend to be somewhere around 25x40 in size.

Yes, this is the bottleneck of my program and is worth optimizing.

Was it helpful?

Solution

On my machine this is faster:

(a == b).sum()

If you don't want to use any extra storage, than I would suggest using numba. I'm not too familiar with it, but this seems to work well. I ran into some trouble getting Cython to take a boolean NumPy array.

from numba import autojit
def pysumeq(a, b):
    tot = 0
    for i in xrange(a.shape[0]):
        for j in xrange(a.shape[1]):
            if a[i,j] == b[i,j]:
                tot += 1
    return tot
# make numba version
nbsumeq = autojit(pysumeq)
A = (rand(10,10)<.5)
B = (rand(10,10)<.5)
# do a simple dry run to get it to compile
# for this specific use case
nbsumeq(A, B)

If you don't have numba, I would suggest using the answer by @user2357112

Edit: Just got a Cython version working, here's the .pyx file. I'd go with this.

from numpy cimport ndarray as ar
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def cysumeq(ar[np.uint8_t,ndim=2,cast=True] a, ar[np.uint8_t,ndim=2,cast=True] b):
    cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0
    for i in xrange(h):
        for j in xrange(w):
            if a[i,j] == b[i,j]:
                tot += 1
    return tot

OTHER TIPS

To start with you can skip then A*B step:

>>> a
array([ True, False,  True, False,  True], dtype=bool)
>>> b
array([False,  True,  True, False,  True], dtype=bool)
>>> np.sum(~(a^b))
3

If you do not mind destroying array a or b, I am not sure you will get faster then this:

>>> a^=b   #In place xor operator
>>> np.sum(~a)
3

If the problem is allocation and deallocation, maintain a single output array and tell numpy to put the results there every time:

out = np.empty_like(a) # Allocate this outside a loop and use it every iteration
num_eq = np.equal(a, b, out).sum()

This'll only work if the inputs are always the same dimensions, though. You may be able to make one big array and slice out a part that's the size you need for each call if the inputs have varying sizes, but I'm not sure how much that slows you down.

Improving upon IanH's answer, it's also possible to get access to the underlying C array in a numpy array from within Cython, by supplying mode="c" to ndarray.

from numpy cimport ndarray as ar
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int cy_sum_eq(ar[np.uint8_t,ndim=2,cast=True,mode="c"] a, ar[np.uint8_t,ndim=2,cast=True,mode="c"] b):
    cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0
    cdef np.uint8_t* adata = &a[0, 0]
    cdef np.uint8_t* bdata = &b[0, 0]
    for i in xrange(h):
        for j in xrange(w):
            if adata[j] == bdata[j]:
                tot += 1
        adata += w
        bdata += w
    return tot

This is about 40% faster on my machine than IanH's Cython version, and I've found that rearranging the loop contents doesn't seem to make much of a difference at this point probably due to compiler optimizations. At this point, one could potentially link to a C function optimized with SSE and such to perform this operation and pass adata and bdata as uint8_t*s

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top