Why is hyped Intel MKL Numpy build slower than ATLAS build on my PC?

https://stackoverflow.com/questions/10371829

04-06-2021
|

문제

I "dual boot" Ubuntu 11.04, Ubuntu 12.04 and Windows XP SP3 all updated to date. PC is rather old Intel Celeron D CPU 3.06GHz with 2GB RAM

In Ubuntu 11.04 I have Numpy compiled with ATLAS (ATLAS compiled from source)
In Ubuntu 12.04 I have Numpy build with latest available MKL, icc, ifort
In Windows XP I have Numpy with MKL (from kindly provided Python packages by Christoph Gohlke)
More details here: http://pastebin.com/raw.php?i=wxuFbyVg

I tried simple:
%timeit np.dot(np.ones((1000,1000)), np.ones((1000,1000)))

and got this results:

Ubuntu ATLAS: 1 loops, best of 3: 457 ms per loop
Windows MKL:  1 loops, best of 3: 680 ms per loop
Ubuntu MKL:   1 loops, best of 3: 1.04 s per loop

I thought above is bad example and I searched for one of many comparisons available, i.e. first Google hit: http://dpinte.wordpress.com/2010/01/15/numpy-performance-improvement-with-the-mkl/

I tested same functions:

%timeit test_eigenvalue()
Ubuntu Atlas: 1 loops, best of 3: 6.38 s per loop
Windows MKL:  1 loops, best of 3: 2.22 s per loop
Ubuntu MKL:   1 loops, best of 3: 3.58 s per loop

%timeit test_svd()
Ubuntu Atlas: 1 loops, best of 3: 2.13 s per loop
Windows MKL:  1 loops, best of 3: 2.06 s per loop
Ubuntu MKL:   1 loops, best of 3: 3.09 s per loop

%timeit test_inv()
Ubuntu Atlas: 1 loops, best of 3: 964 ms per loop
Windows MKL:  1 loops, best of 3: 1.02 s per loop
Ubuntu MKL:   1 loops, best of 3: 1.59 s per loop

%timeit test_det()
Ubuntu Atlas: 1 loops, best of 3: 308 ms per loop
Windows MKL:  1 loops, best of 3: 322 ms per loop
Ubuntu MKL:   1 loops, best of 3: 491 ms per loop

%timeit test_dot()
Ubuntu Atlas: 1 loops, best of 3: 1.5 s per loop        
Windows MKL:  1 loops, best of 3: 1.77 s per loop
Ubuntu MKL:   1 loops, best of 3: 2.77 s per loop

So ATLAS compiled Numpy has best results for some reason.
Does anyone know what could be the problem?

해결책

Intel® MKL is designed and optimized primarily for server and high performance desktop and mobile processors. Celeron D was a relatively low performance processor so MKL was never optimized for it. For example, if you check the SVD performance on a recent Intel Core i7 desktop, MKL-enabled NumPy can run as much as 80% faster than ATLAS-enabled NumPy. See here: http://software.intel.com/en-us/articles/numpy-scipy-with-mkl/

By the way, to get faster responses to MKL related questions please join the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/

다른 팁

I have also used numpy with mkl on my desktop intel core i3 4th generation 2.3 GHz, 4GB ram. I have tested dot products of two 4096x4096 matrix. i have tested them on anaconda distributions of python 3.5 64 bit (no mkl support till date), python 2.7 64 bit without mkl and python 2.7 64 bit with mkl and results were almost similar. They all took ~73 seconds (+- 0.5 sec) to compute this (for the case of integer datatype), for float64 data types they all took 260 ms (+- 5 ms) and for complex data types ~1 second (+- 100 ms) second by all of them.

I have also found that the numpy dot is the best even for complex matrix multiplications. They have already implemented the Gauss improvements.

I have tested for cython using blas, blas in python, einsum in python but dot is the best.

i need to multiply matrices fasster

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow