Problem converting from int to float

https://stackoverflow.com/questions/1018231

06-07-2019
|

Question

There is a strange behavior I cannot understand. Agreed that float point number are approximations, so even operations that are obviously returning a number without decimal numbers can be approximated to something with decimals.

I'm doing this:

int num = (int)(195.95F * 100);

and since it's a floating point operation I get 19594 instead of 19595.. but this is kind of correct.

What puzzles me is that if I do

float flo = 195.95F * 100;
int num = (int) flo;

I get the correct result of 19595.

Any idea of why this happens?

Solution

I looked to see if this was the compiler doing the math, but it behaves this way even if you force it out:

static void Main()
{
    int i = (int)(GetF() * GetI()); // 19594
    float f = GetF() * GetI();
    int j = (int)f; // 19595
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int GetI() { return 100; }
[MethodImpl(MethodImplOptions.NoInlining)]
static float GetF() { return 195.95F; }

It looks like the difference is whether it stays in the registers (wider than normal r4) or is forced to a float variable:

L_0001: call float32 Program::GetF()
L_0006: call int32 Program::GetI()
L_000b: conv.r4 
L_000c: mul 
L_000d: conv.i4 
L_000e: stloc.0

L_000f: call float32 Program::GetF()
L_0014: call int32 Program::GetI()
L_0019: conv.r4 
L_001a: mul 
L_001b: stloc.1 
L_001c: ldloc.1 
L_001d: conv.i4 
L_001e: stloc.2

The only difference is the stloc.1 / ldloc.1.

This is supported by the fact that if you do an optimised build (which will remove the local variable) I get the same answer (19594) for both.

OTHER TIPS

this code...

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            float result = 195.95F*100;
            int intresult = (int)(195.95F * 100);
        }
    }
}

give this IL

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       14 (0xe)
  .maxstack  1
  .locals init ([0] float32 result,
           [1] int32 intresult)
  IL_0000:  nop
  IL_0001:  ldc.r4     19595.
  IL_0006:  stloc.0
  IL_0007:  ldc.i4     0x4c8a
  IL_000c:  stloc.1
  IL_000d:  ret
} // end of method Program::Main

look at IL_00001 -> the compier has done the calc.. Otherwise there are the decimal -> binary conversion problem

Mark's answer is correct in that it is the conversion between nativefloat and float32/float64.

This is covered in the CLR ECMA spec but David Notario explains this far better than I could.

Try converting float to double in your second example:

double flo = 195.95F * 100;
int num = (int) flo;

I'm guessing in your first example the compiler is using double to hold the intermediate result, and so in the float case you're losing precision.

When you multiply by 100, that is an integer, so it is doing an implicit conversion at that step. If you put an "F" behind the 100, I'll bet they'd be the same.

I typically use boxing/unboxing with the parentheses when it is a reference type. When it is a value type, I try to use the Convert static methods.

Try Convert.ToSingle(YourNumber); for a more reliable conversion.

HTH

I can't answer why the second one works and the first one doesn't. However, I can tell you that 195.95 is a non-terminating decimal in binary, and as such round off errors like this one are bound to happen.

Try converting to a double rather than float. You could also use a money or a decimal type rather than a float. That will store the number differently and more accurately.

For more on floating point numbers and the IEEE representation, go here:

http://en.wikipedia.org/wiki/IEEE_754

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow