Possible loss of precision between two different compiler configurations

https://stackoverflow.com/questions/22531209

18-06-2023
|

Question

I am currently stuck on a problem at work that involves a possible loss of precision when the compiler configuration is changed from Debug to Release, which have different levels of optimization. For some reason, elsewhere in our code, extremely large values have been used for covariance matrices (and things of that sort), values somewhere along the lines of 1e90. The problem I'm encountering is that whenever there is any sort of loss of precision in a calculation and one of these extremely large values is still around, the product of the two introduces some instability. I'm not sure why more reasonable values aren't used, but I'm not the one that wrote this code, so yeah... As of now, I believe I have tracked down the problem to a specific location. The exact numbers I have at that location are shown below:

DBL sum = 6.000000040000000400e-004; // same for debug and release configurations
const DBL dinv = 2.000000020000000300e-004; // same for debug and release configurations

Note that DBL is your ordinary double:

typedef double DBL;

Then, the following operation is performed:

sum /= dinv;

This yields:

sum = 2.999999990000000100e+000 // (for debug configuration)<br>
sum = 2.999999989999999600e+000 // (for release configuration)

I took a look at the disassembly for the two configurations and found some differences (expected because of different amounts of optimization).

--DEBUG--

1D91FF73  movsd       xmm0,mmword ptr [sum]
1D91FF78  divsd       xmm0,mmword ptr [dinv]
1D91FF7D  movsd       mmword ptr [sum],xmm0

I haven't ever really read disassembly, but my understanding is as follows: sum is moved to xmm0, then xmm0 is divided in-place by dinv (result is in xmm0 since division is in-place), then xmm0 is moved to sum.

As expected, the disassembly for release is different.

--RELEASE--

1D7557AB  movsd       xmm1,mmword ptr [esp+50h]  
1D7557B1  xorps       xmm0,xmm0  
1D7557B4  mulsd       xmm1,mmword ptr [esp+68h]

The disassembly for the assignment of sum to dinv is:

1D7B55B7  movsd       xmm1,mmword ptr [esp+68h]

Am I correct in thinking that dinv is the value pointed to by the pointer represented by [esp+68h] and sum is the value pointed to by the pointer represented by [esp+50h]? If not, what is the case?

Does anybody know why I am losing precision? What is the purpose of xorps?

The x86 Instruction Set Reference at this link may be helpful: http://x86.renejeschke.de/

--UPDATE--
As the answer below mentioned, the Debug configuration was using /fp:precise and the Release configuration was using /fp:fast (was using Microsoft Visual Studio 2013, to get to the build configuration settings for a project, simply right-click on that project, click properties, then navigate to C/C++). For me, this resulted in round-off errors on the order of 1e-15, give or take an order. This was a problem for me because elsewhere in the code, some people were using extremely large values (on the order of 1e90, give or take an order). One thing I did to "break" the Debug configuration for testing purposes was to split the sum /= dinv computation into two steps. First, take the reciprocal of dinv by computing 1.0/dinv (this is mentioned as being a bad operation to perform in the answer below), multiply that result by sum, and place the result into sum. I found when I did this that Debug and Release both behaved poorly.

Solution

If you are using

GCC with -freciprocal-math (directly or indirectly via -funsafe-math-optimizations or -ffast-math or -Ofast)
Visual Studio and the fp:fast mode for Floating-Point Semantics

the compiler may generate a standard division instruction in debug mode:

1D91FF78  divsd       xmm0,mmword ptr [dinv]

or a "division by multiplicative inverse" in release mode:

1D7557B4  mulsd       xmm1,mmword ptr [esp+68h]

Mathematically

a / b = a * (1 / b)

but in the real world multiplying by the reciprocal is always going to introduce more error and compilers are not allowed to perform this optimization because the results would be different and non-conformant (wrt IEEE-754).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow