Speeding up double absolute value in C++

Question 1

Do all 3 computations. Stick the result in a 3 element array. Use non branching arithmetic to find the correct array index. Return that result.

Ie,

bool icheck = i > 10;
bool zero = icheck & (a > b);
bool one = !icheck & (b > a);
bool two = !zero & !one;
int idx = one | (two << 1);
return val[idx];

Where val holds the result of the three computations. The use of & instead of && is important.

This removes your branch prediction problems. Finally, make sure the looping code can see the implementation, so the call overhead can be eliminated.

Question 2

Interesting question.

double func(double a, double b, double c, double d, int i){
    if(i > 10 && a > b || i < 11 && a < b)
        return abs(a-b)/c;
    else
        return d/c;
}

First thoughts are that:

where's the "inline" qualifier?
there's lots of potential for branch misprediction, and
lots of short-circuit boolean evaluation.

I'm going to assume a is never equal to b - my gut instinct is that there's a 50% chance that's true of your data set, and it allows some interesting optimisations. If it's not true, then I've nothing to suggest that Yakk hasn't already.

double amb = a - b;
bool altb = a < b; // or signbit(amb) if it proves faster for you
double abs_amb = (1 - (altb << 1)) * amb;
bool use_amb = i > 10 != altb;
return (use_amb * abs_amb + !use_amb * d) / c;

One of the aims I was mindful of when structuring the work was to permit some concurrency in a CPU execution pipeline; this could be illustrated like this:

amb    altb    i > 10
   \  /    \     /
  abs_amb  use_amb
        \  /      \
 use_amb*abs_amb  !use_amb*d
             \    /
              + /c

Question 3

Have you tried unrolling the if like so:

double func(double a, double b, double c, double d, int i){
    if(i > 10 && a > b)
        return (a-b)/c;
    if (i < 11 && a < b)
        return (b-a)/c;
    return d/c;
}

Question 4

I would look at the assembly produced by calling fabs(). It could be the overhead of a function call. If so, replace it with an inlined solution. If it's really the content of checking for the absolute value that's expensive, try a bitwise and (&) with a bitmask that is 1 everywhere except for the sign bit. I doubt that this would be cheaper than what the standard library vendor's fabs() generates, though.