Question

I need to convert a 32 bit IEEE754 float to a signed Q19.12 fixed-point format. The problem is that it must be done in a fully deterministic way, so the usual (int)(f * (1 << FRACTION_SHIFT)) is not suitable, since it uses non-deterministic floating point math. Are there any "bit fiddling" or similar deterministic conversion methods?

Edit: Deterministic in this case is assumed as: given the same floating point data achieve exactly same conversion results on different platforms.

Was it helpful?

Solution 2

While @StephenCanon's answer might be right about this particular case being fully deterministic, I've decided to stay on the safer side, and still do the conversion manually. This is the code I have ended up with (thanks to @CodesInChaos for pointers on how to do this):

public static Fixed FromFloatSafe(float f) {
    // Extract float bits
    uint fb = BitConverter.ToUInt32(BitConverter.GetBytes(f), 0);
    uint sign = (uint)((int)fb >> 31);
    uint exponent = (fb >> 23) & 0xFF;
    uint mantissa = (fb & 0x007FFFFF);

    // Check for Infinity, SNaN, QNaN
    if (exponent == 255) {
        throw new ArgumentException();
    // Add mantissa's assumed leading 1
    } else if (exponent != 0) {
        mantissa |= 0x800000;
    }

    // Mantissa with adjusted sign
    int raw = (int)((mantissa ^ sign) - sign);
    // Required float's radix point shift to convert to fixed point
    int shift = (int)exponent - 127 - FRACTION_SHIFT + 1;

    // Do the shifting and check for overflows
    if (shift > 30) {
        throw new OverflowException();
    } else if (shift > 0) {
        long ul = (long)raw << shift;
        if (ul > int.MaxValue) {
            throw new OverflowException();
        }
        if (ul < int.MinValue) {
            throw new OverflowException();
        }
        raw = (int)ul;
    } else {
        raw = raw >> -shift;
    }

    return Fixed.FromRaw(raw);
}

OTHER TIPS

Floating-point is not non-deterministic. Where did you get that preposterous hypothesis?

Expanding a little bit: 1 << FRACTION_SHIFT is an exact power of two, and therefore represented exactly in floating point. Multiplication by an exact power of two is exact (unless overflow/underflow occurs, but in that case there isn't a meaningful fixed-point representation anyway, so you don't care). So the only possible source of rounding is the conversion to integer, which is fully specified by C#; so not only is the result deterministic, but you will get portable identical results.

If determinism is absolutely required, I'd parse the content as an integer, and do the conversion manually.

First extract the exponent. If it's too small return 0, if it's too large, throw an overflow exception.

Next extract sign and mantissa (remember the implicit leading 1). If the sign bit is 1, flip the sign of the mantissa. Finally execute a bit shift by the exponent combined with a bias.

I also wrote a soft float implementation, that guarantees determinism. It's pretty incomplete, but the parts you need are implemented.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top