You could emulate it with a pshufb
and a lookup table.
shl eax, 4
pshufb xmm0, [lut + eax]
The lookup table would start with (I think)
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
80 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E
80 80 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D
You could also use plain old unaligned reads, and use nothing "weird": (not tested)
movdqa [temp + 16], xmm0
pxor xmm0, xmm0
movdqa [temp], xmm0
neg eax
movdqu xmm0, [eax + temp + 16]
But that may suffer from a store forwarding failure, which may cost a dozen cycles.