biggest integer that can be stored in a double

https://stackoverflow.com/questions/1848700

13-09-2019
|

Question

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision ?

Solution

The biggest/largest integer that can be stored in a double without losing precision is the same as the largest possible value of a double. That is, DBL_MAX or approximately 1.8 × 10³⁰⁸ (if your double is an IEEE 754 64-bit double). It's an integer. It's represented exactly. What more do you want?

Go on, ask me what the largest integer is, such that it and all smaller integers can be stored in IEEE 64-bit doubles without losing precision. An IEEE 64-bit double has 52 bits of mantissa, so I think it's 2⁵³:

2⁵³ + 1 cannot be stored, because the 1 at the start and the 1 at the end have too many zeros in between.
Anything less than 2⁵³ can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one.
2⁵³ obviously can be stored, since it's a small power of 2.

Or another way of looking at it: once the bias has been taken off the exponent, and ignoring the sign bit as irrelevant to the question, the value stored by a double is a power of 2, plus a 52-bit integer multiplied by 2^{exponent − 52}. So with exponent 52 you can store all values from 2⁵² through to 2⁵³ − 1. Then with exponent 53, the next number you can store after 2⁵³ is 2⁵³ + 1 × 2^{53 − 52}. So loss of precision first occurs with 2⁵³ + 1.

OTHER TIPS

9007199254740992 (that's 9,007,199,254,740,992) with no guarantees :)

Program

#include <math.h>
#include <stdio.h>

int main(void) {
  double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
  while (dbl + 1 != dbl) dbl++;
  printf("%.0f\n", dbl - 1);
  printf("%.0f\n", dbl);
  printf("%.0f\n", dbl + 1);
  return 0;
}

Result

9007199254740991
9007199254740992
9007199254740992

Wikipedia has this to say in the same context with a link to IEEE 754:

On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.

2^53 is just over 9 * 10^15.

The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.

This is represented as 0x7FEFFFFFFFFFFFFF, which is made up of:

The sign bit 0 (positive) rather than 1 (negative)
The maximum exponent 0x7FE (2046 which represents 1023 after the bias is subtracted) rather than 0x7FF (2047 which indicates a NaN or infinity).
The maximum mantissa 0xFFFFFFFFFFFFF which is 52 bits all 1.

In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.

The exact decimal value is:

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

This is approximately 1.8 x 10³⁰⁸.

You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.

1.7976931348623157 × 10^308

http://en.wikipedia.org/wiki/Double_precision_floating-point_format

DECIMAL_DIG from <float.h> should give at least a reasonable approximation of that. Since that deals with decimal digits, and it's really stored in binary, you can probably store something a little larger without losing precision, but exactly how much is hard to say. I suppose you should be able to figure it out from FLT_RADIX and DBL_MANT_DIG, but I'm not sure I'd completely trust the result.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow