You can use min/max operations to get the desired result, e.g.
inline __m128i _mm_sgn_epi16(__m128i v)
{
v = _mm_min_epi16(v, _mm_set1_epi16(1));
v = _mm_max_epi16(v, _mm_set1_epi16(-1));
return v;
}
This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.
Note that there is already an _mm_sign_epi16
intrinsic in SSSE3 (PSIGNW
- see tmmintrin.h
), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16
. Using _mm_sign_epi16
might be more efficient when SSSE3 is available however, so you could do something like this:
inline __m128i _mm_sgn_epi16(__m128i v)
{
#ifdef __SSSE3__
v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
#else
v = _mm_min_epi16(v, _mm_set1_epi16(1)); // use PMINSW/PMAXSW on SSE2/SSE3.
v = _mm_max_epi16(v, _mm_set1_epi16(-1));
#endif
return v;
}