Question

my code:

int main()
{
long long a = pow(2,63) - 1;
long long b = pow(2,63);
double c  = pow(2,63) - 1;
double d = pow(2,63);
printf("%lld %lld \n%f %f \n%lld %lld\n", a, b, c, d, (long long)c, (long long)d);

return 0;
}

and the excute result is (codeblock with gcc in win7 x64):

9223372036854775807 9223372036854775807
9223372036854775800.000000 9223372036854775800.000000
-9223372036854775808 -9223372036854775808

Question:

Why a == b ?

I know that c == d because of the precision of double.

But why (long long)c and (long long)d is not 9223372036854775800 ?

And why (long long)c != a and (long long)d != b?

Was it helpful?

Solution 2

why a == b? I know that c == d because of the precision of double.

For exactly the same reason. There are no overloads of pow for integer types, so the arithmetic is done using double. Since double typically has 52 bits of significance, adding or subtracting 1 to a value as large as 263 will have no effect.

why (long long)c and (long long)d is not 9223372036854775800?

Because long long is a 64-bit signed type, and the maximum representable value is 263-1. c and d might both have the value 263 (or even a slightly larger value), which is out of range. On a typical 2s-complement platform, this is likely to overflow to give a value around -263, as you observe. But note that this is undefined behaviour; you cannot rely on anything if a floating point conversion overflows.

why (long long)c != a and (long long)d != b?

I don't know; for me, a and b have the same large negative values. It looks like some quirk of your implementation caused a and b to end up with the value 263-1 rather than the expected 263. As always when dealing with floating-point numbers, you should expect small rounding errors like that.

You could get the exact result by using integer arithmetic:

long long a = (1ULL << 63) - 1;
unsigned long long b = 1ULL << 63;

Note the use of unsigned arithmetic since, as mentioned above, the signed (1LL << 63) would overflow.

OTHER TIPS

pow(2,63) - 1 is all done in double-precision floating point arithmetic. In particular, the -1 is converted into -1.0 and that is too small to matter

why a == b

Because your compiler (gcc) calculated the values to initialize a and b with, and found (proved ?) both were matching or exceeding the maximum possible value for a long long, so it initialized both with that maximum value LLONG_MAX (or 0x7FFFFFFFFFFFFFFF, or 9223372036854775807 on your platform).

Note that (as pointed out by Pascal Cuoq) this is undefined behaviour, caused by an overflow while converting a double to a long long when initializing a and b. While gcc deals with this as described above, other compilers can deal with this differently

I know that c ==d because of the precision of double

The reason c and d hold the same value is indeed because of the precision of a double :

  • pow(2, 63) can be accurately represented with fraction 1 and exponent 63
  • pow(2, 63) - 1 cannot be accurately represented

The reason it's not showing 9223372036854775808 (the precise value stored in c and d), is because of the printf precision, which on your platform apparently only shows 17 digits. You might be able to force it to show more using eg. %20.0f, but on Windows that will likely not make a difference due to this bug.

why (long long)c and (long long)d is not 9223372036854775800 ?

Because c and d hold the value 9223372036854775808, or 0x8000000000000000, which when printed as a signed value becomes -9223372036854775808.

Note that this is again undefined behaviour (due to signed overflow).

why (long long)c != a and (long long)d != b?

Because they were calculated in different ways. a and b were calculated by the compiler, while (long long) c and (long long) d were calculated at runtime.

While normally, these different ways of calculating should yield the same results, we're dealing with undefined behaviour here (as explained earlier), so anything goes. And in your case, the compiler's results are different from the runtime results.

Because pow returns a double and double lost precisions. That's why a==b.

pow(2, 63) is equivalent to pow((double) 2, (double) 63).

Indeed, C++11 26.8 [c.math] paragraph 3 says that <cmath> provides the declaration of double pow(double, double) and paragraph 11 says that (emphasis mine)

  1. If any argument corresponding to a double parameter has type long double, then all arguments corresponding to double parameters are effectively cast to long double.
  2. Otherwise, if any argument corresponding to a double parameter has type double or an integer type, then all arguments corresponding to double parameters are effectively cast to double.
  3. Otherwise, all arguments corresponding to double parameters are effectively cast to float.

Now, the literals 2 and 63 are ints, therefore, pow(2, 63) is equivalent to pow((double) 2, (double) 63). The returning type is then double which doesn't have 63 bits of precision required to "see" the difference between 2^63 and 2^63 - 1.

I recommend the reading of this post and the excelent answer by Howard Hinnant.

long long -> %lld

long double ->%Lf

double -> %f

float -> %f

int -> %d

Read Chapter 15 in << POINTERS on C >> for more details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top