Pregunta

On gcc 4.7.3, my fegetround() function returns FE_TONEAREST. According to the c++ reference, this means rounding away from zero. Essentially, it means saving the last bit that was shifted out when adjusting the precision of the mantissa after multiplication (since it will be twice as long as it should be). Afterwards, the saved bit is added to the final mantissa result.

For example, floating point multiplication gives the following results:

0x38b7aad5 * 0x38b7aad5 = 0x3203c5af

The mantissa after multiplication is

  1011 0111 1010 1010 1101 0101
x 1011 0111 1010 1010 1101 0101
-------------------------------
1[000 0011 1100 0101 1010 1110] [1]000 0101 1001 0101 0011 1001

The [23'b] set holds the significant digits, whereas the [1'b] set holds the last bit shifted out. Note that the mantissa for the result is

[000 0011 1100 0101 1010 1111]

The last bit switched to 1 because the [1'b1] set was added to the spliced mantissa (the [23'b] set) due to the rounding mode.

Here is an example that is stumping me, because it looks to me like the hardware isn't rounding correctly.

0x20922800 * 0x20922800 = 0x1a6e34c (check this on your machine)

  1010 0110 1110 0011 0100 1101
x 1010 0110 1110 0011 0100 1101
-------------------------------
01[01 0011 0111 0001 1010 0110 0][1]00 0000 0000 0000 0000 0000

Final Mantissas:       
Their Result:      01 0011 0111 0001 1010 0110 0
Correct Result(?): 01 0011 0111 0001 1010 0110 1

I've been crunching binary all day, so it's possible I'm missing something simple here. Which answer is correct with the given rounding mode?

¿Fue útil?

Solución

When rounding to nearest, IEEE specifies that ties round to even. 0 is even, 1 is odd, so Intel is correct.

Otros consejos

First rounding to nearest lacks one detail here. It is rounding to nearest (even).

IEEE 754 standard (Section 4.3.1) quote:

roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered

In your first example you compute square of (8.75794e-5) which (if represented as 32 bit float) has the following hex pattern: 0x38b7aad5.

All 24 significand bits of (8.75794e-5) are:

0xb7aad5 = 1.0110111_10101010_11010101

Now after squaring that you get:

1.0000011_11000101_10101110_10000101_10010101_00111001

It is noteworthy that in 99% of cases your computations will be performed on FPU (x87 probably) which operates on 80bit floating point format.

Intel® 64 and IA-32 Architectures Software Developer’s Manual

(PROGRAMMING WITH THE X87 FPU):

When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended-precision floating-point format (if they are not already in that format).

Now you want to store your result in 32 bit float:

1.[0000011_11000101_10101110]10000101_10010101_00111001

and here is where rounding modes matter. IEEE 754 defines 4 of them but let's focus on the default one (rounding to nearest (even)) as we discuss this one here.

Now that your FPU has the result (the whole - we have 64 significand bits in 80bit format) computed it must perform rounding to fit the number within 32 bits (24significand bits). All 23 bits that would need to be explicitly stored are placed within brackets above.

Now rounding to nearest has nothing to do with even word in this particular case since bits on the right of the bracket are not halfway between:

1.[0000011_11000101_10101111]
and
1.[0000011_11000101_10101110]

but they are nearer to

1.[0000011_11000101_10101111]

This is why your result's significand is 0x3203C5AF.

Now problematic result of squaring 2.4759832E-19 0x20922800.

24 significand bits of 2.4759832E-19 are :

0x922800 = 1.0010010_00101000_0000_0000

and squared:

1.[0100110_11100011_01001100]10000000_00000000_0000000

And here is where even part really matters. Now your value lies exactly halfway between:

1.[0100110_11100011_01001101]
and
1.[0100110_11100011_01001100]

Above 2 values are said to bracket your value. From them you now need to choose even one (the latter since lsb=0).

Now you know why 24bits of your result are 0xA6E34C(nearest even) and not 0xA6E34D(nearest but odd)

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top