Question

According to the Wikipedia link of IBM hexadecimal floating point:

Consider encoding the value −118.625 as an IBM single-precision floating-point value.

The value is negative, so the sign bit is 1.

The value 118.62510 in binary is 1110110.1012. This value is normalized by moving the radix point left four bits (one hexadecimal digit) at a time until the leftmost digit is zero, yielding 0.011101101012. The remaining rightmost digits are padded with zeros, yielding a 24-bit fraction of .0111 0110 1010 0000 0000 00002.

The normalized value moved the radix point two digits to the left, yielding a multiplier and exponent of 16+2. A bias of +64 is added to the exponent (+2), yielding +66, which is 100 00102.

Combining the sign, exponent plus bias, and normalized fraction produces this encoding:

S Exp Fraction
1 100 0010 0111 0110 1010 0000 0000 0000
In other words, the number represented is −0.76A00016 × 1666 − 64 = −0.4633789… × 16+2 = −118.625

Now, the definition of normalization according to Wikipedia says that

In base $b$ a normalized number will have the form $±d_0.d_1d_2d_3...×b_n$ where $d_0≠0$, and the digits $d_0,d_1,d_2,d_3,...$ are integers between $0$ and $b−1$

So, how is $0.011101101012 \times 16^2$ a normalized number?
In fact this number cannot be represented as a normalized one with base $16$ exponent because the closest we can get is $1.1101101012 \times 16^1 \times 2^2$. What am I missing here?

Was it helpful?

Solution

I'm going to start with this famous quote from James Wilkinson's 1970 Turing Award Lecture, Some Comments from a Numerical Analyst.

In the early days of the computer revolution computer designers and numerical analysts worked closely together and indeed were often the same people. Now there is a regrettable tendency for numerical analysts to opt out of any responsibility for the design of the arithmetic facilities and a failure to influence the more basic features of software. It is often said that the use of computers for scientific work represents a small part of the market and numerical analysts have resigned themselves to accepting facilities "designed" for other purposes and making the best of them. I am not convinced that this in inevitable, and if there were sufficient unity in expressing their demands there is no reason why they could not be met. After all, one of the main virtues of an electronic computer from the point of view of the numerical analyst is its ability to "do arithmetic fast." Need the arithmetic be so bad!

In 1970, Fortran had only been cross-platform for four years. A lot of numerical analysis was being done on IBM hardware, and System/360 in particular, but every CPU vendor had its own floating point format, and most of them (as Wilkinson indicated) were not designed by those who demanded high-quality floating point arithmetic.

Today, the industry has standardised on IEEE-754. It's imperfect, because everything is, but it's reliable and can be relied upon.

Wikipedia's definition of "normalised number" (or "normalized number"; I'm going to type it as my dialect spells it) is the current modern definition. It's the one you learned in high school when discussing scientific notation. It's the one used by IEEE-754. It's the one that's in all the modern textbooks.

But it's not the only normal form for floating point that has existed, and I think that's the source of the confusion here.

Interestingly, it's not the only normal form in use today! There is at least one place in modern programming where a different normal form is still in use, because it pre-dates IEEE-754.

The C standard library has a function to extract the parts of a floating-point number's normal form, called frexp. But the normal form that it uses is that the mantissa is in the range $\left[ 0.5, 1 \right)$, not the modern $\left[ 1, 2 \right)$. This is a notorious "gotcha" for numeric analysts working in C or C++ today.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top