Pregunta

Each programming language has is own way to convert an integer to a float, translating 01010 to other 01010. If you see ASM generated code it uses coprocessor instructions to hide to the user the real value.

But how does it work in real? how is calculated the mantissa, exponent algorithmic-ally?

¿Fue útil?

Solución

If you know the floating point format, you should have been able to work out the algorithm yourself.

  1. If the input is 0, the result is all 0 bits.
  2. If the input is negative, set the sign bit to 1, and complement the input.
  3. Find the highest bit set. Add the bias to its index, that's gonna be your exponent.
  4. Clear the highest bit set, what remains is the mantissa.

Since this question has been tagged , here is a sample implementation for x86:

int_to_float:
    xor eax, eax
    mov edx, [esp+4]
    test edx, edx
    jz .done
    jns .pos
    or eax, 0x80000000 ; set sign bit
    neg edx
.pos:
    bsr ecx, edx
    ; shift the highest bit set into bit #23
    sub ecx, 23
    ror edx, cl         ; works for cl < 0 too
    and edx, 0x007fffff ; chop off highest bit
    or eax, edx         ; mantissa
    add ecx, 127 + 23   ; bias
    shl ecx, 23
    or eax, ecx         ; exponent
.done:
    ret

Note: this returns the float in eax, while the calling convention usually mandates st0. I just wanted to avoid FPU code totally.

Otros consejos

When converting an integer to a floating point number, it's just shifted until the mantissa is within the right range, i.e. 1 < m < 2, and the exponent is just how many steps it shifts.

The number 1010 for example is shifted until it is 1.010 and the exponent is 3 as that is how many bits it was shifted.

The first digit of the mantissa, the 1 before the decimal separator, is not stored in the number, as it's always one. (The value zero is treated as a separate case.)

The expontent (for a double precision number) is stored with an offset of 1023 (001111111111), so the expontent 3 is stored as 1026 (010000000010).

That makes the representation of 1010 as a double precision floating point number:

010000000010 010 0000000000000000000000000000000000000000000000000

All those zeroes after 010 is to fill up the rest of the 52 bit mantissa.


You can read more about the floating point format here:
Wikipedia: Double-precision floating-point format

For 32-bit ints, 64-bit int64s, and IEEE 64-bit doubles, the following trick works (apart from violating aliasing rules and whatnot):

double convert(int x) {
  double tricky = 0x1.8p53;
  int64 hack = (int64 &)tricky + x;
  return (double &)hack - 0x1.8p53;
}

Here I take tricky = 2^53 + 2^52. The smallest representable change in this value is 1, meaning the significand is measured in units of 1. The significand is stored in the low-order 52 bits of a double. I won't overflow or underflow the significand by adding x to it (since x is 32-bit), so hack is the binary representation of 2^53 + 2^52 + x as a double. Subtracting off 2^53 + 2^52 gives me x, but as a double.

(What follows, I think, is sorta close to x86-64 assembly code. I don't see why it wouldn't do the right thing, but I haven't tested it. Or even assembled it.)

movsx rax, dword ptr [x]
add rax, [tricky]
mov [hack], rax
fld [hack]
fsub st(0), [tricky]
fstp [answer]
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top