Bizarre floating-point behavior with vs. without extra variables, why?

Question 1

You are converting out-of-range double values to unsigned long long. This is not allowed in standard C++, and Visual C++ appears to treat it really badly in SSE2 mode: it leaves a number on the FPU stack, eventually overflowing it and making later code that uses the FPU fail in really interesting ways.

A reduced sample is

double d = 1E20;
unsigned long long ull[] = { d, d, d, d, d, d, d, d };
if (floor(d) != floor(d)) abort();

This aborts if ull has eight or more elements, but passes if it has up to seven.

The solution is not to convert floating point values to an integer type unless you know that the value is in range.

4.9 Floating-integral conversions [conv.fpint]

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [ Note: If the destination type is bool, see 4.12. -- end note ]

The rule that out-of-range values wrap when converted to an unsigned type only applies if the value as already of some integer type.

For whatever it's worth, though, this doesn't seem like it's intentional, so even though the standard permits this behaviour, it may still be worth reporting this as a bug.

Question 2

9223372036854775808 is 0x8000000000000000; that is, it is equal to INT64_MIN cast to uint64_t.

It looks like your compiler is casting the return value of floor to long long and then casting that result to unsigned long long.

Note that it is quite usual for overflow in floating-point-to-integral conversion to yield the least representable value (e.g. cvttsd2siq on x86-64):

When a conversion is inexact, a truncated result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.

(this is from the doubleword documentation, but the quadword behaviour is the same.)

Question 3

Hypothesis: It is a bug. The compiler converts double to unsigned long long correctly but converts extended-precision floating-point (possibly long double) to unsigned long long incorrectly. Details:

double              x = std::floor(9710908999.0089989 * 1.0E9);

This computes the value on the right-hand side and stores it in x. The value on the right-hand side might be computed with extended precision, but it is, as the rules of C++ require, converted to double when stored in x. The exact mathematical value would be 9710908999008998870, but rounding it to the double format produces 9710908999008999424.

unsigned long long y1 = x;

This converts the double value in x to unsigned long long, producing the expected 9710908999008999424.

unsigned long long y2 = std::floor(9710908999.0089989 * 1.0E9);

This computes the value on the right-hand side using extended precision, producing 9710908999008998870. When the extended-precision value is converted to unsigned long long, there is a bug, producing 2⁶³ (9223372036854775808). This value is likely the “out of range” error value produced by an instruction that converts the extended-precision format to a 64-bit integer. The compiler has used an incorrect instruction sequence to convert its extended-precision format to an unsigned long long.

Question 4

You have casted y1 as a double before casting it again to a long. the value of x isn't the "floor" value but a rounded value for floor.

Same logic would apply with casting integers and floats. float x = (float)((int) 1.5) will give a different value to float x = 1.5

Bizarre floating-point behavior with vs. without extra variables, why?

Update: