Question

Im trying to optimize my exercise application in VS2010. Basically I have several sqrt, pow and memset in the core loop. More specifically, this is what I do:

// in a cpp file ...
#include <cmath>

#pragma intrinsic(sqrt, pow, memset)
void Simulator::calculate() 
{
  for( int i=0; i<NUM; i++ )
  {
    ...
    float len = std::sqrt(lenSq);
    distrib[0] = std::pow(baseVal, expVal);
    ...
    clearQuad(i); // invokes memset
  }
}

After build, the disassembly shows that, for example, the sqrt call still compiles as "call _CIsqrt(0x####)" nothing changes regardless of whether the /Oi flag is enabled or not.

Can anybody kindly explain how can i enable the intrinsic version and how can I verify it by the disassembly code? (I have also enabled the /O2 in the project settings.)

Thank you

Edit: Problem solved by adding /fp:fast. For sqrt, as an example, the intrinsic version uses a single "fsqrt" to replace the std version "call __CIsqrt()". Sadly, in my case, the intrinsic version is 5% slower.

Many thanks to Zan Lynx and mch.

Was it helpful?

Solution

You are compiling to machine code and not to .NET CLR. Right?

If you compile to .NET then the code won't be optimized until it is run through JIT. At that point .NET has its own intrinsics and other things that will happen.

If you are compiling to native machine code, you might want to play with the /arch option and the /fp:fast option.

OTHER TIPS

The use of the C++ std namespace might be causing the compiler not to use the intrinsics. Try removing std:: from your sqrt, pow, and memset calls.

The MSDN Library documentation for #pragma intrinsic offers up an example for testing if the intrinsic truely is being used: compile with the -FAs flag and look at the resulting .asm file.

Looking at the disassembly in the debugger, as you seem to already be doing, should also show the intrinsic rather than a call.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top