I have a 128 bit variable filled with 4 separate integers. [1,2,3,4]. I want to shift right, so I can get [2,3,4,0]. What's the fastest way to do this.

My current code:

__m128 v1;
v1 = (__m128)_mm_srli_si128(  _mm_castps_si128(v1) , 4 );

this succeeds in shifting the bits, but I am trying to go for speed and cache optimization (aka fewest variables as possible). Is there anyway to improve this code to avoid casting to and from a __m128i?

thanks

有帮助吗?

解决方案

Don't worry about it. __m128 and __m128i are two different ways of interpreting the contents of an XMM register, so the cast disappears in compilation. My compiler (clang on Mac OS 10.9) compiles the whole thing down to a single instruction as it stands:

psrldq $0x4, %xmm0
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top