문제

We know that floating point operations have high latency and take many clock cycles to execute which may cause pipeline to stall! what are the different methods to optimize the following code.

int main()

{

 float fsum[50],a=10.45;

 int isum[100],b=20;

 for(int i=0;i<100;i++)
   {

       if(i<50) 
           {
             fsum[i] = a*a;
           }
       isum[i] = b*b

   }
return 0;
}
도움이 되었습니까?

해결책

If, for whatever reason, your compiler cannot be trusted to exhibit basic optimization competence, and the code it generates runs with lower performance than you were expecting based on machine limits (you're measuring performance, and you know those limits, right?), then you can start optimizing manually:

Lift loop-invariant calculation outside the loop:

int main()
{
  float fsum[50],a=10.45;
  float aa = a * a;
  int isum[100],b=20;
  int bb = b * b;

  for(int i=0;i<100;i++)
  {
    if(i<50) {
         fsum[i] = aa;
    }
    isum[i] = bb;
  }

  return 0;
}

Split the loop, and set the bounds to match the enclosed condition

int main()
{
  float fsum[50],a=10.45;
  float aa = a * a;
  int isum[100],b=20;
  int bb = b * b;

  for(int i=0; i < 50; i++)
  {
    fsum[i] = aa;
  }

  for(int i=0;i<100;i++)
  {
    isum[i] = bb;
  }

  return 0;
}

Now, if the compiler can't manage to unroll and vectorize a single-level simple loop or two, then those are your problem. Go look them up.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top