Question

A program repeats some calculation over an array of doubles. Then something unfortunate happens and NaN get produced... It starts running much slower after this.

-ffast-math does not change a thing.

Why does it happen with -ffast-math? Shouldn't it prevent throwing floating-point exceptions and just proceed and churn out NaNs at the same rate as usual numbers?

Simple example:

nan.c

#include <stdio.h>
#include <math.h>

int main() {
    long long int i;
    double a=-1,b=0,c=1;

    for(i=0; i<100000000; ++i) {
        a+=0.001*(b+c)/1000;
        b+=0.001*(a+c)/1000;
        c+=0.001*(a+b)/1000;
        if(i%1000000==0) { fprintf(stdout, "%g\n", a); fflush(stdout); }
        if(i==50000000) b=NAN;
    }
    return 0;
}

running:

$ gcc -ffast-math -O3 nan.c -o nan && ./nan  | ts '%.s'
...
1389025567.070093 2.00392e+33
1389025567.085662 1.48071e+34
1389025567.100250 1.0941e+35
1389025567.115273 8.08439e+35
1389025567.129992 5.9736e+36
1389025568.261108 nan
1389025569.385904 nan
1389025570.515169 nan
1389025571.657104 nan
1389025572.805347 nan

Update: Tried various -O3, -ffast-math, -msse, -msse3 - no effect. Hovewer when I tried building for 64-bits instead of usual 32-bits, it started to process NaNs as fast as other numbers (in addition to general 50% speedup), even without any optimisation options. Why NaNs are so slow in 32-bit mode with -ffast-math?

Était-ce utile?

La solution

Your compiler defaults to using x87 (which incurs a stall for processing NaNs) when producing a 32-bit executable. Pass -mfpmath=sse to tell it to use SSE (which can handle NaNs at speed) instead.

Autres conseils

Floating point operations on NaN are exceptional cases and definitely take longer to execute. It's important to remember when vectorizing with SSE because any NaNs that sneak into don't-care columns in the registers can still make your code run much slower.

This page includes some performance measurements of math on NaN which is even worse than I thought!

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top