Question

So I know about floating point precision (and how things like 1.1 can't be expressed exactly in binary) and all that, but I'm wondering: how then, do math-related libraries implement infinite precision? In other words, how would you represent, for example, 1.1 accurately in binary? Just a brief description would be great, I can figure out the exact details myself. Thanks. :)

Was it helpful?

Solution

There are no infinite precision libraries, but there are arbitrary precision libraries. For details of how these are implemented, read some documentation :-)

To represent 1.1 exactly in binary, floating point can't be used as you correctly point out. It can be represented if you store the integral part (1) as an integer, and the fractional part (.1) as another integer, and then you need to create the logic to deal with these structures. Alternatively, it could be stored as a fraction (11/10) with both the denominator and the numerator stored as integers.

OTHER TIPS

If you really mean infinite precision there are two options:

  • Use some form of lazy computation. Then you can ask a number for as much precision as you like "after" executing the computation (since it's lazy it actually only done then). The drawback is that this is very inefficient. You can do this in a language like Haskell using a special number system where representations overlap, e.g. base 2 with digits -1, 0, 1. The usual representation is unsuitable, because at say 1 you need infinite precision to decide between outputting 0 for 0.999... and 1 for 1.000...

  • Do computation symbolically. Represent integers, rationals, roots, etc exactly. This is needed if you want to decide equality, but also rather inefficient and limited to special cases.

Mathematical libraries with infinite precision are not implemented. It cannot be done. The number 1/3 cannot be represented in a finite number of bits other than as a fraction. Transcendent numbers like pi and e cannot be represented completely in any fashion.

On the other hand, it is possible to create math libraries with huge precision. It's all a matter of allocating enough bits for the mantissa of the floating point value.

There are certain geometric algorithms that depend on exact arithmetic, so if you look in the CGAL library you will find a variety of exact numeric types that are "closed" under various operations. That is, there is no way to use the supported operations to produce a result that can't be represented exactly.

Some examples:

  • Integers are closed under addition and multiplication.

  • Rationals are also closed under division, except for a special case for zero. Can be represented as a pair of integers. See also the rational number functions in GMP. eg 1.1 = 11/10, can be represented as (11, 10).

  • A number type that is also closed under square root.

You could also represent numbers in decimal and do decimal arithmetic. The underlying representation is binary in the sense that each digit is represented with a binary code. Each digit -- whether to the left or right of the decimal point -- is treated as an integer. Arithmetic is then done ``manually'', digit-by-digit.

One example of a decimal-based library is BCMath in PHP.

While Pax is totally right here when we're talking about floating points and numbers, I believe there is a solution but it's very inefficient.
You can use a string to represent your number, strings do not loss precision.
Whenever you have a number like "0.0001" + "0.1" you iterate both string and convert only the current position to an int.
Step 1:
0 + 0 = 0 -> convert to string and assign to data[0].
Step 2:
0 + 1 = 1 -> convert to string and assign to data[1].
Step 3:
iter > "0.1".lenght() -> stop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top