Question

According to nvidia. cublasZgemm is 6x faster than intel MKL.

However, on my PC (i7 2600, Nvidia gtx560, OS:linux 64bit), cublasZgemm is slightly slower than MKL.

I use the numpy.dot() that come with enthought python distribution which links numpy against MKL 10.3.

A matrix multiplication function using cublasZgemm is compiled in a shared library and is called using ctypes in a python script.

When multiply two 1024x1024 complex matrices. numpy.dot() took 84ms. The ctypes function call spent 110ms while the cublasZgemm() part took 97ms.

I wonder why cublassZgemm is not as fast as nvidia stated?

Was it helpful?

Solution

I wonder why cublassZgemm is not as fast as nvidia stated?

The short answer is because you used a much slower GPU to perform the benchmarking of zgemm than was used by NVIDIA to generate their performance figures. Your GTX560 is probably about eight times slower in double precision performance than the Telsa M2090 that was used by NVIDIA in your link.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top