SSE2 instruction to convert a 8x16 register to two 4x32 registers having the even and odd indexed elements

StackOverflow https://stackoverflow.com/questions/16732730

  •  30-05-2022
  •  | 
  •  

Question

Is there any SSE2 instruction to convert a 8x16 register to two 4x32 registers,one 4x32 register having the odd indexed elements from the 8x16 register and the other having the even indexed elements? Please suggest.

Was it helpful?

Solution

Untested:

movdqa xmm1, xmm0
pslld xmm0, 16
psrad xmm1, 16  ; odd words
psrad xmm0, 16  ; even words

Should be easy enough to convert to intrinsics.

There is no single instruction for this, not even in later versions of SSE. Multiple-outputs is very rare, mostly reserved for old instructions.

pmovsxwd from SSE4.1 uses the (for this problem) wrong subset of elements, namely the bottom 4.

OTHER TIPS

Note sure if there's a single instruction for this, but something like this ought to work (untested):

; Assume that the 8 16-bit values are in xmm0
PSHUFLW xmm1,xmm0,0D8h  ; Change word order to 3120 in the low qword
PSHUFHW xmm1,xmm1,0D8h  ; Change word order to 3120 in the high qword
PSHUFD xmm1,xmm1,0D8h   ; Change dword order to 3120
MOVAPD xmm0,xmm1        ; Copy to xmm0
PUNPCKLWD xmm0,xmm0     ; Expand even words to dwords
PUNPCKHWD xmm1,xmm1     ; Expand odd words to dwords
PSLLD xmm0,16           ; Sign-extend
PSRAD xmm0,16           ; ...
PSLLD xmm1,16
PSRAD xmm1,16

xmm0 should now contain the 4 even words sign-extended to 32 bits, and xmm1 should contain the odd words.

If you can use SSE4.1 instructions it's possible to simplify the sign-extension part a bit. For the even words (xmm0) you could replace the unpack and the two shifts with:

PMOVSXWD xmm0,xmm0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top