Pregunta

I'm profiling my code and optimized everything I could, coming down to a function which looks something like this:

double func(double a, double b, double c, double d, int i){
    if(i > 10 && a > b || i < 11 && a < b)
        return abs(a-b)/c;
    else
        return d/c;
}

It is called millions of times during the run of the program and profiler shows me that ~80% of all time is spent on calling abs().

  1. I replaced abs() with fabs() and it gave about 10% speed up which doesn't make much sense to me as I heard multiple times that they are identical for floating point numbers and abs() should be used always. Is it untrue or I'm missing something?

  2. What would be the quickest way to evaluate absolute value for a double which could further improve the performance?

If that matters, I use g++ on linux X86_64.

¿Fue útil?

Solución

Do all 3 computations. Stick the result in a 3 element array. Use non branching arithmetic to find the correct array index. Return that result.

Ie,

bool icheck = i > 10;
bool zero = icheck & (a > b);
bool one = !icheck & (b > a);
bool two = !zero & !one;
int idx = one | (two << 1);
return val[idx];

Where val holds the result of the three computations. The use of & instead of && is important.

This removes your branch prediction problems. Finally, make sure the looping code can see the implementation, so the call overhead can be eliminated.

Otros consejos

Interesting question.

double func(double a, double b, double c, double d, int i){
    if(i > 10 && a > b || i < 11 && a < b)
        return abs(a-b)/c;
    else
        return d/c;
}

First thoughts are that:

  • where's the "inline" qualifier?
  • there's lots of potential for branch misprediction, and
  • lots of short-circuit boolean evaluation.

I'm going to assume a is never equal to b - my gut instinct is that there's a 50% chance that's true of your data set, and it allows some interesting optimisations. If it's not true, then I've nothing to suggest that Yakk hasn't already.

double amb = a - b;
bool altb = a < b; // or signbit(amb) if it proves faster for you
double abs_amb = (1 - (altb << 1)) * amb;
bool use_amb = i > 10 != altb;
return (use_amb * abs_amb + !use_amb * d) / c;

One of the aims I was mindful of when structuring the work was to permit some concurrency in a CPU execution pipeline; this could be illustrated like this:

amb    altb    i > 10
   \  /    \     /
  abs_amb  use_amb
        \  /      \
 use_amb*abs_amb  !use_amb*d
             \    /
              + /c

Have you tried unrolling the if like so:

double func(double a, double b, double c, double d, int i){
    if(i > 10 && a > b)
        return (a-b)/c;
    if (i < 11 && a < b)
        return (b-a)/c;
    return d/c;
}

I would look at the assembly produced by calling fabs(). It could be the overhead of a function call. If so, replace it with an inlined solution. If it's really the content of checking for the absolute value that's expensive, try a bitwise and (&) with a bitmask that is 1 everywhere except for the sign bit. I doubt that this would be cheaper than what the standard library vendor's fabs() generates, though.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top