Question

I have a 128 bit variable filled with 4 separate integers. [1,2,3,4]. I want to shift right, so I can get [2,3,4,0]. What's the fastest way to do this.

My current code:

__m128 v1;
v1 = (__m128)_mm_srli_si128(  _mm_castps_si128(v1) , 4 );

this succeeds in shifting the bits, but I am trying to go for speed and cache optimization (aka fewest variables as possible). Is there anyway to improve this code to avoid casting to and from a __m128i?

thanks

Était-ce utile?

La solution

Don't worry about it. __m128 and __m128i are two different ways of interpreting the contents of an XMM register, so the cast disappears in compilation. My compiler (clang on Mac OS 10.9) compiles the whole thing down to a single instruction as it stands:

psrldq $0x4, %xmm0
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top