Displaying IEEE-754 quadruple-precision (binary128) floating point values in scientific notation in C#

StackOverflow https://stackoverflow.com/questions/22918330

Question

I'm trying to translate the raw binary data from a thread context into a human-readable format, and have come up empty when trying to translate quadruple-precision floating point values into a readable format in C#.

Ultimately, I'd like to display it in standard scientific notation, e.g. 1.234567×1089. I'm not worried about loss of precision in the process - I just want a reasonable idea of what the value is.

My first thought was to manually compute the value as a double by raising the exponent, but of course I'm going to exceed the maximum value in many cases. I don't mind losing precision, but not being able to display it at all isn't acceptable.

Is there some kind of simple mathematical hack I can use for this?

Was it helpful?

Solution

You could install a third-party library that handles that. For example it looks like QPFloat gives you a new struct called System.Quadruple which overrides ToString, so you could try that.

(I wonder when .NET will support something like System.Quadruple.)

OTHER TIPS

So here's an answer to expand on the comment I made earlier. I hope you don't mind that I'm using Python, since I know where to find everything I need in that language; maybe someone else can translate this into a suitable answer in C#.

Suppose that you've got a sequence of 128 bits representing a number in IEEE 754 binary128 format, and that we've currently read those 128 bits in in the form of an unsigned integer x. For example:

>>> x = 0x4126f07c18386f74e697bd57a865a9d0

(I guess this would be a bit messier in C#, since as far as I can tell it doesn't have a 128-bit integer type; you'd need to either use two 64-bit integers for the high and low words, or use the BigInteger type.)

We can extract the exponent and significand via bit operations as usual (I'm assuming that you already got this far, but I wanted to include the computation for completeness):

>>> significand_mask = (1 << 112) - 1
>>> exponent_mask = (1 << 127) - (1 << 112)
>>> trailing_significand = x & significand_mask
>>> significand = 1.0 + float(trailing_significand) / (2.0**112) 
>>> biased_exponent = (x & exponent_mask) >> 112
>>> exponent = biased_exponent - 16383

Note that while the exponent is exact, we've lost most of the precision of significand at this point, keeping only 52-53 bits of precision.

>>> significand
1.9393935334951098
>>> exponent
295

So the value represented is around 1.9393935334951098 * 2**295, or around 1.234567e+89. But you can't do the computation directly at this stage because it might overflow a Double (in this case it doesn't, but if the exponent were bigger you'd have a problem). So here's where the logs come in: let's compute the natural log of the value represented by x:

>>> from math import log, exp
>>> log_of_value = log(significand) + exponent*log(2)
>>> log_of_value
205.14079357778544

Then we can divide by log(10) to get the exponent and mantissa for the decimal part: the quotient of the division gives the decimal exponent, while the remainder gives the log of the significand, so we have to apply exp to it to retrieve the actual significand:

>>> exp10, mantissa10 = divmod(log_of_value, log(10))
>>> exp10
89.0
>>> significand10 = exp(mantissa10)
>>> significand10
1.234566999999967

And formatting the answer nicely:

>>> print("{:.10f}e{:+d}".format(significand10, int(exp10)))
1.2345670000e+89

That's the basic idea: to do this generally you'd also need to handle the sign bit and the special bit patterns for zeros, subnormal numbers, infinities and NaNs. Depending on the application, you may not need all of those.

There's some precision loss involved firstly in the conversion of the integer significand to a double precision float, but also in taking logs and exponents. The worst case for precision loss occurs when the exponent is large, since a large exponent magnifies the absolute error involved in the log(2) computation, which in turn contributes a larger relative error when taking exp to get the final significand. But since the (unbiased) exponent doesn't exceed 16384, it's not hard to bound the error. I haven't done the formal computations, but this should be good for around 12 digits of precision across the range of the binary128 format, and precision should be a bit better for numbers with small exponent.

there are few hacks for that...

  1. compute hex string for number

    mantissa and exponent are in binary so there should be no problem just do not forget to add zero for each 2^4 exponent part and shift the mantissa by exponent&3 bits. Negative exponents need few tweaks but are very similar.

    All of this can be done by bit and shift operations so no precision loss if coded right ...

  2. convert hex string to dec string

    there are quite a few examples also here on SO here is mine. You can also tweak it a little to skip zero processing for more speed...

  3. now scan the dec string

    if you look at mine dec2hex and hex2dec conversions in link above then the scan is already there you need to find:

    • the position of first nonzero decimal from left and right
    • position of decimal point

    from these you easily compute exponent

  4. convert dec string to mantissa * 10^exponet form

    it is quite straight forward just remove zeros ... and translate decimal point to its new position then add exponent part ...

  5. add sign for mantissa

    you can add it directly in bullets #1,#2 but if you do it in the end then it will spare you a few ifs ...

Hope this helps ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top