Question

I found myself today doing some bit manipulation and I decided to refresh my floating-point knowledge a little!

Things were going great until I saw this:

... 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits

I read it again and again but I still can't figure out where the 24th bit is, I noticed something about a binary point so I assumed that it's a point in the middle between the mantissa and the exponent.

I'm not really sure but I believe he author was talking about this bit:

         Binary point?
             |
s------e-----|-------------m----------
0 - 01111100 - 01000000000000000000000
           ^ this
Était-ce utile?

La solution

The 24th bit is implicit due to normalization.

The significand is shifted left (and one subtracted from the exponent for each bit shift) until the leading bit of the significand is a 1.

Then, since the leading bit is a 1, only the other 23 bits are actually stored.

There is also the possibility of a denormal number. The exponent is stored as a "bias" format signed number, meaning that it's an unsigned number where the middle of the range is defined to mean 01. So, with 8 bits, it's stored as a number from 0..255, but 0 is interpreted to mean -128, 128 is interpreted to mean 0, and 255 is interpreted as 127 (I may have a fencepost error there, but you get the idea).

If, in the process of normalization, this is decremented to 0 (meaning an actual exponent value of -128), then normalization stops, and the significand is stored as-is. In this case, the implicit bit from normalization it taken to be a 0 instead of a 1.

Most floating point hardware is designed to basically assume numbers will be normalized, so they assume that implicit bit is a 1. During the computation, they check for the possibility of a denormal number, and in that case they do roughly the equivalent of throwing an exception, and re-start the calculation with that taken into account. This is why computation with denormals often gets drastically slower than otherwise.


  1. In case you wonder why it uses this strange format: IEEE floating point (like many others) is designed to ensure that if you treat its bit pattern as an integer of the same size, you can compare them as signed, 2's complement integers and they'll still sort into the correct order as floating point numbers. Since the sign of the number is in the most significant bit (where it is for a 2's complement integer) that's treated as the sign bit. The bits of the exponent are stored as the next most significant bits -- but if we used 2's complement for them, an exponent less than 0 would set the second most significant bit of the number, which would result in what looked like a big number as an integer. By using bias format, a smaller exponent leaves that bit clear, and a larger exponent sets it, so the order as an integer reflects the order as a floating point.

Autres conseils

Normally (pardon the pun), the leading bit of a floating point number is always 1; thus, it doesn't need to be stored anywhere. The reason is that, if it weren't 1, that would mean you had chosen the wrong exponent to represent it; you could get more precision by shifting the mantissa bits left and using a smaller exponent.

The one exception is denormal/subnormal numbers, which are represented by all zero bits in the exponent field (the lowest possible exponent). In this case, there is no implicit leading 1 in the mantissa, and you have diminishing precision as the value approaches zero.

For normal floating point numbers, the number stored in the floating point variable is (ignoring sign) 1. mantissa * 2exponent-offset. The leading 1 is not stored in the variable.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top