Question

Short version: how can I detect overflow using the fixed-point multiplication described here but for a signed type?

Long version:

I still have some overflow issues with my Q31.32 fixed point type. To make it easier to work out examples on paper, I've made a much smaller type using the same algorithm, a Q3.4 based on sbyte. I figure that if I can work out all the kinks for a Q3.4 type, the same logic should apply for a Q31.32 one.

Note that I could very easily implement Q3.4 multiplication by performing it on a 16-bit integer, but I'm doing as if that didn't exist, because for the Q31.32 I'd need a 128-bit integer which doesn't exist (and BigInteger is too slow).

I want my multiplication to handle overflow by saturation, that is when overflow happens, the result is the highest or smallest value that can be represented depending on the sign of the operands.

This is basically how the type is represented:

struct Fix8 {
    sbyte m_rawValue;
    public static readonly Fix8 One = new Fix8(1 << 4);
    public static readonly Fix8 MinValue = new Fix8(sbyte.MinValue);
    public static readonly Fix8 MaxValue = new Fix8(sbyte.MaxValue);

    Fix8(sbyte value) {
        m_rawValue = value;
    }

    public static explicit operator decimal(Fix8 value) {
        return (decimal)value.m_rawValue / One.m_rawValue;
    }

    public static explicit operator Fix8(decimal value) {
        var nearestExact = Math.Round(value * 16m) * 0.0625m;
        return new Fix8((sbyte)(nearestExact * One.m_rawValue));
    }
}

And this is how I currently handle multiplication:

    public static Fix8 operator *(Fix8 x, Fix8 y) {
        sbyte xl = x.m_rawValue;
        sbyte yl = y.m_rawValue;

        // split x and y into their highest and lowest 4 bits
        byte xlo = (byte)(xl & 0x0F);
        sbyte xhi = (sbyte)(xl >> 4);
        byte ylo = (byte)(yl & 0x0F);
        sbyte yhi = (sbyte)(yl >> 4);

        // perform cross-multiplications
        byte lolo = (byte)(xlo * ylo);
        sbyte lohi = (sbyte)((sbyte)xlo * yhi);
        sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
        sbyte hihi = (sbyte)(xhi * yhi);

        // shift results as appropriate
        byte loResult = (byte)(lolo >> 4);
        sbyte midResult1 = lohi;
        sbyte midResult2 = hilo;
        sbyte hiResult = (sbyte)(hihi << 4);

        // add everything
        sbyte sum = (sbyte)((sbyte)loResult + midResult1 + midResult2 + hiResult);

        // if the top 4 bits of hihi (unused in the result) are neither all 0s or 1s,
        // then this means the result overflowed.
        sbyte topCarry = (sbyte)(hihi >> 4);
        bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
        if (topCarry != 0 && topCarry != -1) {
            return opSignsEqual ? MaxValue : MinValue;
        }

        // if signs of operands are equal and sign of result is negative,
        // then multiplication overflowed upwards
        // the reverse is also true
        if (opSignsEqual) {
            if (sum < 0) {
                return MaxValue;
            }
        }
        else {
            if (sum > 0) {
                return MinValue;
            }
        }

        return new Fix8(sum);
    }

This gives result accurate within the precision of the type and handles most overflow cases. It doesn't however handle these ones, for example:

Failed -8 * 2 : expected -8 but got 0
Failed 3.5 * 5 : expected 7,9375 but got 1,5

Let's work out how the multiplication happens for the first one.

-8 and 2 are represented as x = 0x80 and y = 0x20.
xlo = 0x80 & 0x0F = 0x00
xhi = 0x80 >> 4 = 0xf8
ylo = 0x20 & 0x0F = 0x00
yhi = 0x20 >> 4 = 0x02

lolo = xlo * ylo = 0x00
lohi = xlo * yhi = 0x00
hilo = xhi * ylo = 0x00
hihi = xhi * yhi = 0xf0

The sum is obviously 0 as all terms are 0 save for hihi, but only the lowest 4 bits of hihi are used in the final sum.

My usual overflow detection magic doesn't work here: the result is zero so the sign of the result is meaningless (e.g. 0.0625 * -0.0625 == 0 (by rounding down), 0 is positive yet signs of operands differ); also the high bits of hihi are 1111 which often happens even when there's no overflow.

Basically I don't know how to detect that overflow happened here. Is there a more general method?

Était-ce utile?

La solution 2

This took me a long time, but I eventually figured everything out. This code is tested to work for every possible combination of x and y in the range allowed by sbyte. Here is the commented code:

    static sbyte AddOverflowHelper(sbyte x, sbyte y, ref bool overflow) {
        var sum = (sbyte)(x + y);
        // x + y overflows if sign(x) ^ sign(y) != sign(sum)
        overflow |= ((x ^ y ^ sum) & sbyte.MinValue) != 0;
        return sum;
    }

    /// <summary>
    /// Multiplies two Fix8 numbers.
    /// Deals with overflow by saturation.
    /// </summary>
    public static Fix8 operator *(Fix8 x, Fix8 y) {
        // Using the cross-multiplication algorithm, for learning purposes.
        // It would be both trivial and much faster to use an Int16, but this technique
        // won't work for a Fix64, since there's no Int128 or equivalent (and BigInteger is too slow).

        sbyte xl = x.m_rawValue;
        sbyte yl = y.m_rawValue;

        byte xlo = (byte)(xl & 0x0F);
        sbyte xhi = (sbyte)(xl >> 4);
        byte ylo = (byte)(yl & 0x0F);
        sbyte yhi = (sbyte)(yl >> 4);

        byte lolo = (byte)(xlo * ylo);
        sbyte lohi = (sbyte)((sbyte)xlo * yhi);
        sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
        sbyte hihi = (sbyte)(xhi * yhi);

        byte loResult = (byte)(lolo >> 4);
        sbyte midResult1 = lohi;
        sbyte midResult2 = hilo;
        sbyte hiResult = (sbyte)(hihi << 4);

        bool overflow = false;
        // Check for overflow at each step of the sum, if it happens overflow will be true
        sbyte sum = AddOverflowHelper((sbyte)loResult, midResult1, ref overflow);
        sum = AddOverflowHelper(sum, midResult2, ref overflow);
        sum = AddOverflowHelper(sum, hiResult, ref overflow);

        bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;

        // if signs of operands are equal and sign of result is negative,
        // then multiplication overflowed positively
        // the reverse is also true
        if (opSignsEqual) {
            if (sum < 0 || (overflow && xl > 0)) {
                return MaxValue;
            }
        }
        else {
            if (sum > 0) {
                return MinValue;
            }
            // If signs differ, both operands' magnitudes are greater than 1,
            // and the result is greater than the negative operand, then there was negative overflow.
            sbyte posOp, negOp;
            if (xl > yl) {
                posOp = xl;
                negOp = yl;
            }
            else {
                posOp = yl;
                negOp = xl;
            }
            if (sum > negOp && negOp < -(1 << 4) && posOp > (1 << 4)) {
                return MinValue;
            }
        }

        // if the top 4 bits of hihi (unused in the result) are neither all 0s nor 1s,
        // then this means the result overflowed.
        sbyte topCarry = (sbyte)(hihi >> 4);
        // -17 (-1.0625) is a problematic value which never causes overflow but messes up the carry bits
        if (topCarry != 0 && topCarry != -1 && xl != -17 && yl != -17) {
            return opSignsEqual ? MaxValue : MinValue;
        }

        // Round up if necessary, but don't overflow
        var lowCarry = (byte)(lolo << 4);
        if (lowCarry >= 0x80 && sum < sbyte.MaxValue) {
            ++sum;
        }

        return new Fix8(sum);
    }

I'm putting all this together into a properly unit tested fixed-point math library for .NET, which will be available here: https://github.com/asik/FixedMath.Net

Autres conseils

You should examine hihi to see whether it contains any relevant bits outside the range of the result. You can also compare the highest bit of the result with the corresponding bit in hihi to see whether a carry propagated that far, and if it did (i.e. the bit changed), whether that indicates an overflow (i.e. the bit changed in the wrong direction). All of this would probably be easier to formulate if you were using one's complement notation, and treat the sign bits separately. But in that case, your example of −8 would be pointless.

Looking at your example, you have hihi = 0xf0.

hihi   11110000
result     ±###.####

So in this case, if there were no overflow in hihi alone, then the first 5 bits would all be the same, and the sign of the result would match the sign of hihi. This is not the case here. You can check this using

if ((hihi & 0x08) * 0x1f != (hihi & 0xf8))
  handle_overflow();

The carry into hihi can probably be detected most easily by adding the result one summand at a time and performing common overflow detection after each step. Haven't got a nice piece of code for that ready.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top