Numpy dot product very slow using ints

https://stackoverflow.com/questions/11856293

25-06-2021
|

Question

sorry for so many questions. I am running Mac OSX 10.6 on Intel core 2 Duo. I am running some benchmarks for my research and I have run into another thing that baffles me.

If I run

python -mtimeit -s 'import numpy as np; a = np.random.randn(1e3,1e3)' 'np.dot(a,a)'

I get the following output: 10 loops, best of 3: 142 msec per loop

However, if I run

python -mtimeit -s 'import numpy as np; a = np.random.randint(10,size=1e6).reshape(1e3,1e3)' 'np.dot(a,a)'

I get the following output: 10 loops, best of 3: 7.57 sec per loop

Then I ran

python -mtimeit -s 'import numpy as np; a = np.random.randn(1e3,1e3)' 'a*a' And then

python -mtimeit -s 'import numpy as np; a = np.random.randint(10,size=1e6).reshape(1e3,1e3)' 'a*a'

Both ran at about 7.6 msec per loop so it is not the multiplication. Adding had similar speeds as well, so neither of these should be affecting the dot-product, right? So why is it over 50 times slower to calculate the dot product using ints than using floats?

Solution

very interesting, I was curious to see how it was implemented so I did:

>>> import inspect
>>> import numpy as np
>>> inspect.getmodule(np.dot)
<module 'numpy.core._dotblas' from '/Library/Python/2.6/site-packages/numpy-1.6.1-py2.6-macosx-10.6-universal.egg/numpy/core/_dotblas.so'>
>>>

So it looks like its using the BLAS library.

so:

>>> help(np.core._dotblas)

from which I found this:

When Numpy is built with an accelerated BLAS like ATLAS, these functions are replaced to make use of the faster implementations. The faster implementations only affect float32, float64, complex64, and complex128 arrays. Furthermore, the BLAS API only includes matrix-matrix, matrix-vector, and vector-vector products. Products of arrays with larger dimensionalities use the built in functions and are not accelerated.

So it looks like ATLAS fine tunes certain functions but its only applicable to certain data types, very interesting.

so yeah it looks I'll be using floats more often ...

OTHER TIPS

Using int vs float data types causes different code paths to be executed:

The stack trace for float looks like this:

(gdb) backtr
#0  0x007865a0 in dgemm_ () from /usr/lib/libblas.so.3gf
#1  0x007559d5 in cblas_dgemm () from /usr/lib/libblas.so.3gf
#2  0x00744108 in dotblas_matrixproduct (__NPY_UNUSED_TAGGEDdummy=0x0, args=(<numpy.ndarray at remote 0x85d9090>, <numpy.ndarray at remote 0x85d9090>), 
kwargs=0x0) at numpy/core/blasdot/_dotblas.c:798
#3  0x08088ba1 in PyEval_EvalFrameEx ()
...

..while the stack trace for int looks like this:

(gdb) backtr
#0  LONG_dot (ip1=0xb700a280 "\t", is1=4, ip2=0xb737dc64 "\a", is2=4000, op=0xb6496fc4 "", n=1000, __NPY_UNUSED_TAGGEDignore=0x85fa960)
at numpy/core/src/multiarray/arraytypes.c.src:3076
#1  0x00659d9d in PyArray_MatrixProduct2 (op1=<numpy.ndarray at remote 0x85dd628>, op2=<numpy.ndarray at remote 0x85dd628>, out=0x0)
at numpy/core/src/multiarray/multiarraymodule.c:847
#2  0x00742b93 in dotblas_matrixproduct (__NPY_UNUSED_TAGGEDdummy=0x0, args=(<numpy.ndarray at remote 0x85dd628>, <numpy.ndarray at remote 0x85dd628>), 
kwargs=0x0) at numpy/core/blasdot/_dotblas.c:254
#3  0x08088ba1 in PyEval_EvalFrameEx ()
...

Both calls lead to dotblas_matrixproduct, but it appears that the float call stays in the BLAS library (probably accessing some well-optimized code), while the int call gets kicked back out to numpy's PyArray_MatrixProduct2.

So this is either a bug or BLAS just doesn't support integer types in matrixproduct (which seems rather unlikely).

Here's an easy and inexpensive workaround:

af = a.astype(float)
np.dot(af, af).astype(int)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow