Pergunta

I wrote this small subroutine that compares simple vector mathematical functions, performed either with a loop:

f(i) = a(i) + b(i)

or direct:

f = a + b

or using Intel MKL VML:

vdAdd(n,a,b,f)

The timing results for n=50000000 are:

VML 0.9 sec direct 0.4 loop 0.4

And I dont understand, why VML takes twice as long as the other methods! (Loop is sometimes faster than direct)

Subroutine can be found under http://paste.ideaslabs.com/show/L6dVLdAOIf and called via

program test

  use vmltests
  implicit none

  call vmlTest()

end program
Foi útil?

Solução

Your sample code have potential L2 cache issue, one can overcome it with blocking optimization. See Intel® Software Networks Forum answer for details: http://software.intel.com/en-us/forums/showthread.php?t=80041

Intel® Optimization Notice:

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top