Question

We know that floating point operations have high latency and take many clock cycles to execute which may cause pipeline to stall! what are the different methods to optimize the following code.

int main()

{

 float fsum[50],a=10.45;

 int isum[100],b=20;

 for(int i=0;i<100;i++)
   {

       if(i<50) 
           {
             fsum[i] = a*a;
           }
       isum[i] = b*b

   }
return 0;
}
Was it helpful?

Solution

If, for whatever reason, your compiler cannot be trusted to exhibit basic optimization competence, and the code it generates runs with lower performance than you were expecting based on machine limits (you're measuring performance, and you know those limits, right?), then you can start optimizing manually:

Lift loop-invariant calculation outside the loop:

int main()
{
  float fsum[50],a=10.45;
  float aa = a * a;
  int isum[100],b=20;
  int bb = b * b;

  for(int i=0;i<100;i++)
  {
    if(i<50) {
         fsum[i] = aa;
    }
    isum[i] = bb;
  }

  return 0;
}

Split the loop, and set the bounds to match the enclosed condition

int main()
{
  float fsum[50],a=10.45;
  float aa = a * a;
  int isum[100],b=20;
  int bb = b * b;

  for(int i=0; i < 50; i++)
  {
    fsum[i] = aa;
  }

  for(int i=0;i<100;i++)
  {
    isum[i] = bb;
  }

  return 0;
}

Now, if the compiler can't manage to unroll and vectorize a single-level simple loop or two, then those are your problem. Go look them up.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top