There are quite a few code generation settings involved here that affect the outcome. The difference that you report is observable in non-optimized code under default floating point model (i.e. "precise" model) when using the "classic" FPU instructions for floating-point computations.
The compiler translates the first call literally: the original integer value is first converted to float
- 4-byte floating-point value - stored in memory (as function argument). This conversion rounds the value to +6.7975504e+7
, which is already not precise. Later that float
value is read form memory inside the first function and used for further computations.
The second call passes an int
value to the function, which is directly loaded into high-precision FPU register and used for further computations. Even though you specified an explicit conversion from int
to float
inside the second function, the compiler decided to ignore your request. This value is never literally converted to float
, meaning that the aforementioned loss of precision never occurs.
That is what is causing the difference you observed.
If you rewrite your second function as
float divide_1000(int y)
{
float fy = y;
float v = fy / 1000.0f;
return v;
}
i.e. add an additional step that saves the float
value to a named location in memory, the compiler will perform that step in non-optimized code. This will cause the results to become identical.
Again, the above applies to the code compiled without optimizations, when the compiler normally attempts to translate all statements very closely (but not always exactly). In optimized code the compiler eliminates the "unnecessary" intermediate conversions to float
and all "unnecessary" intermediate memory stores in both cases, producing identical results.
You might also want to experiment with other floating-point models (i.e. "strict" and "fast") to see how it affects the results. These floating-point models exist specifically to deal with issues like the one you observed.
If you change code generation settings of the compiler and make it use SSE instructions for floating-point arithmetic, the results might also change (in my experiment the difference disappears when SSE2 instruction set is used instead of FPU instructions).