Question

Edit: I know floating point arithmetic is not exact. And the arithmetic isn't even my problem. The addition gives the result I expected. 8099.99975f doesn't.


So I have this little program:

public class Test {
    public static void main(String[] args) {
        System.out.println(8099.99975f); // 8099.9995
        System.out.println(8099.9995f + 0.00025f); // 8100.0
        System.out.println(8100f == 8099.99975f); // false
        System.out.println(8099.9995f + 0.00025f == 8099.99975f); // false
        // I know comparing floats with == can be troublesome
        // but here they really should be equal in every bit.
    }
}

I wrote it to check if 8099.99975 is rounded to 8100 when written as an IEEE 754 single precision float. To my surprise Java converts it to 8099.9995 when written as a float literal (8099.99975f). I checked my calculations and the IEEE standard again but couldn't find any mistakes. 8100 is just as far away from 8099.99975 as 8099.9995 but the last bit of 8100 is 0 which should make it the right representation.

So I checked the Java language spec to see if I missed something. After a quick search I found two things:

  • The Java programming language requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen.

  • The Java programming language uses round toward zero when converting a floating value to an integer [...].

I noticed here that nothing was said about float literals. So I thought that float literals maybe are just doubles which when cast to float are rounded to zero similarly to the float to int casting. That would explain why 8099.99975f was rounded to zero.

I wrote the little program you can see above to check my theory and indeed found that when adding two float literals that should result in 8100 the correct float is computed. (Note here that 8099.9995 and 0.00025 can be represented exactly as single floats so there's no rounding that could lead to a different result) This confused me since it didn't make much sense to me that float literals and computed floats behaved differently so I dug around in the language spec some more and found this:

A floating-point literal is of type float if it is suffixed with an ASCII letter F or f [...]. The elements of the types float [...] are those values that can be represented using the IEEE 754 32-bit single-precision [...] binary floating-point formats.

This ultimately states that the literal should be rounded according to the IEEE standard which in this case is to 8100. So why is it 8099.9995?

Was it helpful?

Solution

The key point to realise is that the value of a floating point number can be worked out in two different ways, that aren't in general equal.

  • There's the value that the bits in the floating point number give the exact binary representation of.
  • There's the "decimal display value" of a floating point number, which is the number with the least decimal places that is closer to that floating point number than any other number.

To understand the difference, consider the number whose exponent is 10001011 and whose significand is 1.11111010001111111111111. This is the exact binary representation of 8099.99951171875. But the decimal value 8099.9995 has fewer decimal places, and is closer to this floating point number than to any other floating point number. Therefore, 8099.9995 is the value that will be displayed when you print out that number.

Note that this particular floating point number is the next lowest one after 8100.

Now consider 8099.99975. It's slightly closer to 8099.99951171875 than it is to 8100. Therefore, to represent it in single precision floating point, Java will pick the floating point number which is the exact binary representation of 8099.99951171875. If you try to print it, you'll see 8099.9995.

Lastly, when you do 8099.9995 + 0.00025 in single precision floating point, the numbers involved are the exact binary representations of 8099.99951171875 and 0.0002499999827705323696136474609375. But because the latter is slightly more than 1/2^12, the result of addition will be closer to 8100 than to 8099.99951171875, and so it will be rounded up, not down at the end, making it 8100.

OTHER TIPS

The decimal value 8099.99975 has nine significant digits. This is more than can be represented exactly in a float. If you use the floating point analysis tool at CUNY you'll see that the binary representation closest to 8099.9995 is 45FD1FFF. When you attempt to add 0.00025 you are suffering a "loss of significance". In order not to lose significant (left-hand) digits of the larger number, the significand of the smaller has to be shifted right to match the scale (exponent) of the larger. When this happens, its value becomes ZERO as it shifts off the right end of the register.

Decimal     Exponent        Significand
---------   --------------  -------------------------
8099.9995   10001011 (+12)  1.11111010001111111111111
   0.00025  01110011 (-12)  1.00000110001001001101111

To line these up for addition, the second one has to shift right 24 bits, but there are only 23 bits in the significand of a single-precision float. The significand disappears, leaving zero, so the addition has no effect.

If you want this to work, switch to double-precision arithmetic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top