Don't worry about it. __m128
and __m128i
are two different ways of interpreting the contents of an XMM register, so the cast disappears in compilation. My compiler (clang on Mac OS 10.9) compiles the whole thing down to a single instruction as it stands:
psrldq $0x4, %xmm0