C "double to num" conversion code: why is it written like this?

Question 1

num2bit is designed to implement the Lua BitOp semantics especially wrt. modular arithmetic. The implementation-defined behavior is well under control, since LuaJIT only works for specific CPUs, platforms and compilers, anyway. Don't use this code anywhere else.

num2u64 is a workaround for a bug/misfeature of MSVC where it always converts double to uint64_t via int64_t. This doesn't give the desired results for numbers >= 2^63. MS considers this abomination a 'feature'. Duh.

Question 2

num2bit: By setting the 51st and 52nd bit to 1, this forces the exponent to be a specific number (otherwise there would be overflow) - then when you return (int32_t)o.u32.lo you know you are getting an integer back with the same value as the 'low 32 bits' of the double since the exponent is fixed. So, this is a trick to get the integer value of most doubles quickly. It looks like it would truncate off numbers after the decimal point by doing this, and it would have unexpected effects if it was 2^51 or larger to begin with.

>>> math.frexp(1.0 + 6755399441055744.0)
(0.7500000000000001, 53)
>>> math.frexp(0.0 + 6755399441055744.0)
(0.75, 53)
>>> math.frexp(564563465 + 6755399441055744.0)
(0.7500000626791358, 53)
>>> math.frexp(-564563465 + 6755399441055744.0)
(0.7499999373208642, 53)
>>> math.frexp(1.5 + 6755399441055744.0)
(0.7500000000000002, 53)
>>> math.frexp(1.6 + 6755399441055744.0)
(0.7500000000000002, 53)
>>> math.frexp(1.4 + 6755399441055744.0)
(0.7500000000000001, 53)

EDIT: The reason why both the 51st and 52nd bit are set is because if you only set the 52nd bit, then negative numbers would cause the exponent to change:

>>> math.frexp(0 + 4503599627370496.0)
(0.5, 53)
>>> math.frexp(-543635634 + 4503599627370496.0)
(0.9999998792886404, 52)

num2u64: No clue. But the first number is 2^63 and the second is 2^64. It's probably to prevent overflow or signedness failure when casting a double larger than 2^63 to its integer representation, but I can't tell you more.

Question 3

num2bit manually converts the in-memory representation of a IEEE standard double to 32-bit, fixed-point, two's complement signed format, using rounding to the nearest integer.

Converting through a union is unsafe because it violates strict type aliasing rules. You're not allowed to write to one member of a union, then read from another. It would be more proper to do something like

static int32_t num2bit(double n)
{
  int32_t o;
  n += 6755399441055744.0;  /* 2^52 + 2^51 */
  memcpy( & o, & n, sizeof o ); /* OK with strict aliasing but must mind endianness. */
  return o;
}

This function is probably intended as an optimization, but its value as such is dubious. You need to re-test on every new microprocessor and ensure it's only used on hardware where it's faster.

Note also that a plain C floating-integral conversion uses round-to-zero, or truncation. This function is perhaps not intended to handle fractional values at all.

num2u64 is a Windows-specific workaround (note the #ifdef). When converting a double value greater than 2⁶³ to an unsigned integer, "something bad" happens (perhaps saturation), so the author subtracts 2⁶⁴ to make it a negative number, then casts that to a signed, negative integer, then casts the result to an unsigned integer which will have a value greater than 2⁶³.

In any case, you can tell the intent is simply to convert a double to a uint64_t, since that's all it does on non-Windows platforms.

Question 4

These functions "work" by magic.

This comes from §6.2.6.1p7 of n1570.pdf, which is the C standard draft: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values

Note how the code presented uses unspecified values by assigning to o.n and then using the value of o.u32.lo.

This comes from §6.3.1.3p3 of n1570.pdf, which is the C standard draft: Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Note how the code presented invokes implementation-defined behaviour, as it converts from unsigned to signed 32-bit integer multiple times. Suppose that it were to instead raise an implementation-defined computational exception signal. If the default signal handler were to return, this would also result in undefined behaviour. /* They think it's a feature. */