_mm_mul_epu32 vs. _mm_mul_epi32

https://stackoverflow.com/questions/19576515

01-07-2022
|

Question

To start the discussion, the basic differences between _mm_mul_epu32 and _mm_mul_epi32 are:

_mm_mul_epu32 is available in SSE2 and takes and produces unsigned integers (32 bit -> 64 bit)
_mm_mul_epi32 is available in SSE4.1 and takes and produces signed integers (32 bit -> 64 bit)

What I don't understand is under what circumstances should one use _mm_mul_epu32? There doesn't seem to be a set instruction like _mm_set[1]_epi32. Like in this example: SSE multiplication of 4 32-bit integers, the best answer writes:

static inline __m128i muly(const __m128i &a, const __m128i &b)
{
    __m128i tmp1 = _mm_mul_epu32(a,b); /* mul 2,0*/
    __m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(a,4), _mm_srli_si128(b,4)); /* mul 3,1 */
    return _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0))); /* shuffle results to [63..0] and pack */
}

_mm_mul_epu32 is used with _epi32 instructions. Isn't this risky to ignore the difference between signed and unsigned integers?

Can someone please provide an example of where _mm_mul_epu32 can be safely used? Thanks!

Solution

Use _mm_mul_epu32 when the operands should be considered unsigned integers, and _mm_mul_epi32 otherwise.

In 32-bit -> 64-bit multiplication, treating operands as unsigned or signed yield different results, so there are separate instructions. Add, sub and mov don't need separate instructions. There is no separate __m128u type. Just use __m128i and remember it contains unsigned numbers.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow