Question

Consider the following code:

// Filename fputest.cpp

#include <cmath>
#include <cstdio>

int main()
{
    double x;
    *(__int64 *) &x = 0xc01448ec3aaa278di64; // -5.0712136427263319
    double sine1 = sin(x);
    printf("%016llX\n", sine1);
    double sine2;
    __asm {
    fld x
    fsin
    fstp sine2
    }
    printf("%016llX\n", sine2);
    return 0;
}

When compiled with Visual C++ 2012 (cl fputest.cpp) and the program is executed, the output is the following:

3FEDF640D8D36174
3FEDF640D8D36175

Questions:

  • Why are these two values different?
  • Is it possible to issue some compiler options so that the computed sine values will be exactly the same?
Was it helpful?

Solution

This problem is not caused by conversion from long double to double. It may be due to inaccuracy in the sin routine in the math library.

The fsin instruction is specified to produce a result within 1 ULP (in the long double format) for operands within its range (per Intel 64 and IA-32 Architectures Software Developer's Manual, October 2011, volume 1, 8.3.10), in round-to-nearest mode. On an Intel Core i7, fsin of the questioner’s value, −5.07121364272633190495298549649305641651153564453125 or -0x1.448ec3aaa278dp+2, produces 0xe.fb206c69b0ba402p-4. We can easily see from this hexadecimal that the last 11 bits are 100 0000 0010. Those are the bits that will be rounded when converting from long double. If they are greater than 100 0000 0000, the number will be rounded up. They are greater. Therefore, the result of converting this long double value to double is 0xe.fb206c69b0ba8p-4, which equals 0x1.df640d8d36175p-1 and 0.93631021832247418590355891865328885614871978759765625. Also note that even if the result were one ULP lower, the last 11 bits would still be greater than 100 0000 0000 and would still round up. Therefore this result should not vary on Intel CPUs conforming to the above documentation.

Compare this to computing a double-precision sine directly, using an ideal sin routine that produces correctly rounded results. The sine of the value is approximately 0.93631021832247413051857150785044253634581268961333520518023697738674775240815140702992025520721336793516756640679315765619707343171517531053811196321335899848286682535203710849065933755262347468763562 (computed with Maple 10). The double closest to this is 0x1.df640d8d36175p-1. That is the same value we obtained by converting the fsin result to double.

Therefore, the discrepancy is not caused by conversion of long double to double; converting the long double fsin result to double produces exactly the same result as an ideal double-precision sin routine.

We do not have a specification for the accuracy of the sin routine used by the questioner’s Visual Studio package. In commercial libraries, allowing errors of 1 ULP or several ULP is common. Observe how close the sine is to a point where the double-precision value is rounded: It is .498864 ULP (double-precision ULP) away from a double, so it is .001136 ULP away from the point where rounding changes. Therefore, even a very slight inaccuracy in the sin routine will cause it to return 0x1.df640d8d36174p-1 instead of the closer 0x1.df640d8d36175p-1.

Therefore, I conjecture the source of the discrepancy is a very small inaccuracy in the sin routine.

OTHER TIPS

(Note: As mentioned in the comments, this does not work on VC2012. I've left it here for general information. I wouldn't recommend relying on anything that depends on the optimization level anyway!)

I don't have VS2012, but on the VS2010 compiler you can specify /fp:fast on the command line and then I get the same results. This causes the compiler to generate "fast" code that doesn't necessarily quite follow the required rounding and rules in C++ but which matches your assembly language calculation.

I can't try this in VS2012 but I imagine it has the same option.

This only seems to work in an optimized build too with /Ox as an option.

See Why is cos(x) != cos(y) even though x == y?

As David mentioned in the comment, the discrepancy comes from moving the data in an FP register to a memory location (register/ram) of a different size. And it isn't always assignment either; even another nearby floating point operation can be enough to flush an FP register, rendering any attempt to guarantee a particular value futile. If you need to do a comparison, you might be able to mitigate some of this by forcing all results to a memory location as follows:

float F1 = sin(a); float F2 = sin(b); if (F1 == F2)

However, even this might not work. The best approach is to accept that any floating point operation will only be 'mostly accurate' and that from the programmers' perspective this error will effectively be unpredictable and random, even if the same operation is done repeatedly. Instead of

if (F1 == F2)

you should use something to the effect of

if (isArbitrarilyClose(F1, F2))

or

if (absf(F1 - F2) <= n)

where n is a tiny number.

The code generator in VS2012 was substantially changed to support automatic vectorization. Part of that change is that x86 floating point math is now done in SSE2 and no longer uses the FPU, necessary because FPU code cannot be vectorized. SSE2 computes with 64-bit precision instead of 80-bit precision, giving good odds that the results are off by one bit due to round-off. Also the reason that @J99 can get a consistent result with /fp:fast in VS2010, its compiler still uses the FPU and /fp:fast uses the FSIN result directly.

There's a lot of bang in this feature, check Jim Hogg's video at the linked url to find out how to take advantage of it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top