Question

When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed?

I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't see that happening. For example, _mm_getcsr() prints the same value in both cases below (8064):

#include <iostream>
#include <emmintrin.h>

using namespace std;

void main()
{
    __m128i a = _mm_set_epi32(1, 0, 0, 0);
    __m128i b = _mm_add_epi32(a, a);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << b.m128i_i32[3] << endl;

    __m128i c = _mm_set_epi32((1<<31)-1, 3, 2, 1);
    __m128i d = _mm_add_epi32(c, c);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << d.m128i_i32[3] << endl;
}

Is there some other way to check for overflow with SSE2?

Était-ce utile?

La solution

Here is a somewhat more efficient version of @hirschhornsalz's sum_and_overflow function:

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow)
{
   __v4si sa, sb;

    sum = _mm_add_epi32(a, b);                  // calculate sum
    sa = _mm_xor_si128(sum, a);                 // compare sign of sum with sign of a
    sb = _mm_xor_si128(sum, b);                 // compare sign of sum with sign of b
    overflow = _mm_and_si128(sa, sb);           // get overflow in sign bit
    overflow = _mm_srai_epi32(overflow, 31);    // convert to SIMD boolean (-1 == TRUE, 0 == FALSE)
}

It uses an expression for overflow detection from Hacker's Delight page 27:

sum = a + b;
overflow = (sum ^ a) & (sum ^ b);               // overflow flag in sign bit

Note that the overflow vector will contain the more conventional SIMD boolean values of -1 for TRUE (overflow) and 0 for FALSE (no overflow). If you only need the overflow in the sign bit and the other bits are "don't care" then you can omit the last line of the function, reducing the number of SIMD instructions from 5 to 4.

NB: this solution, as well as the previous solution on which it is based are for signed integer values. A solution for unsigned values will require a slightly different approach (see @Stephen Canon's answer).

Autres conseils

Since you have 4 possible overflows, the control register would very quickly run out of bits, especially, if you wanted carrys, sign etc. and that even for a vector addition consisting of 16 bytes :-)

The overflow flag is set, if the input sign bits are both equal and the result sign bit is different than a input sign bit.

This functions calculates sum = a+b and overflow manually. For every overflow 0x80000000 is returend in overflow.

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow) {
    __v4si signmask = _mm_set1_epi32(0x80000000);
    sum = a+b;
    a &= signmask;
    b &= signmask;
    overflow = sum & signmask;
    overflow = ~(a^b) & (overflow^a); // overflow is 1 if (a==b) and (resultbit has changed)
}

Note: If you don't have gcc, you have to replace the ^ & + operators by the appropriate SSE intrinsics, like _mm_and_si128(), _mm_add_epi32() etc.

Edit: I just noticed the and with the mask can of course be done at the very end of the function, saving two and operations. But the compiler will very likely be smart enough to do it by itself.

I notice you asked for a solution for unsigned as well; fortunately, that's pretty easy too:

__v4si mask = _mm_set1_epi32(0x80000000);
sum = _mm_add_epi32(a, b);
overflow = _mm_cmpgt_epi32(_mm_xor_si128(mask, a), _mm_xor_si128(mask, sum));

Normally to detect unsigned overflow, you simply check either sum < a or sum < b. However, SSE does not have unsigned comparisons; xor-ing the arguments with 0x80000000 lets you use a signed comparison to get the same result.

No flags are touched by the underlying PADDD instruction.

So to test this, you have to write additional code, depending on what you want to do.

Note: You are a bit hindered by the lack of epi32 intrisics

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top