Question

I was researching quaternion SSE implementations to understand how they worked (since I'm implementing my own) and I came across this Bullet implementation for quaternion multiplication:

VECTORMATH_FORCE_INLINE const Quat Quat::operator *( const Quat &quat ) const
{
    __m128 ldata, rdata, qv, tmp0, tmp1, tmp2, tmp3;
    __m128 product, l_wxyz, r_wxyz, xy, qw;
    ldata = mVec128;
    rdata = quat.mVec128;
    tmp0 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,0,2,1) );
    tmp1 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,1,0,2) );
    tmp2 = _mm_shuffle_ps( ldata, ldata, _MM_SHUFFLE(3,1,0,2) );
    tmp3 = _mm_shuffle_ps( rdata, rdata, _MM_SHUFFLE(3,0,2,1) );
    qv = vec_mul( vec_splat( ldata, 3 ), rdata );
    qv = vec_madd( vec_splat( rdata, 3 ), ldata, qv );
    qv = vec_madd( tmp0, tmp1, qv );
    qv = vec_nmsub( tmp2, tmp3, qv );
    product = vec_mul( ldata, rdata );
    l_wxyz = vec_sld( ldata, ldata, 12 );
    r_wxyz = vec_sld( rdata, rdata, 12 );
    qw = vec_nmsub( l_wxyz, r_wxyz, product );
    xy = vec_madd( l_wxyz, r_wxyz, product );
    qw = vec_sub( qw, vec_sld( xy, xy, 8 ) );
        VM_ATTRIBUTE_ALIGN16 unsigned int sw[4] = {0, 0, 0, 0xffffffff};
    return Quat( vec_sel( qv, qw, sw ) );
}

The bit I am concerned about is these two lines:

l_wxyz = vec_sld( ldata, ldata, 12 );
r_wxyz = vec_sld( rdata, rdata, 12 );

Macros implementation:

#define _mm_ror_ps(vec,i)       \
    (((i)%4) ? (_mm_shuffle_ps(vec,vec, _MM_SHUFFLE((unsigned char)(i+3)%4,(unsigned char)(i+2)%4,(unsigned char)(i+1)%4,(unsigned char)(i+0)%4))) : (vec))

#define vec_sld(vec,vec2,x) _mm_ror_ps(vec, ((x)/4))

If I understand it correctly, for a number that is not divisible by 4 (3 isn't [12/4 = 3]), the vec_sld macro will reduce to:

l_wxyz = ldata;//vec_sld( ldata, ldata, 12 );
r_wxyz = rdata;//vec_sld( rdata, rdata, 12 );

Which is effectively doing nothing.

And if the value is divisible by 4:

q = vec_sld( x, x, 16 );

The macro will reduce to:

q = _mm_shuffle_ps( x, x, _MM_SHUFFLE(3,2,1,0) );

Which, again, is like doing nothing, since _MM_SHUFFLE(3,2,1,0) is leaving x, y, z, and w in their current places.

If vec_sld is not doing anything, what is its purpose?

Am I missing anything?

EDIT: Here are the two files the source code comes from

Was it helpful?

Solution

I think where you got confused here is that ((i)%4) evaluates to TRUE when i is not a multiple of 4, so you get an _mm_shuffle_ps for non-multiples of 4, otherwise you just get the original vector (since a rotate by a multiple of 4 is a no-op).

Some background which may be useful:

The vec_XXX macros indicate that this code was originally ported from PowerPC/AltiVec. vec_sld is an AltiVec intrinsic which shifts a pair of vectors by a given number of bytes. In this context it appears that vec_sld is being used to rotate a single vector, since the two input vectors are the same, and it appears that 12 is being passed as a byte shift (i.e. rotate by 3 floats).

So vec_sld(v, v, 12) gets translated to _mm_ror_ps(v, 12/4) = _mm_ror_ps(v, 3) which then gets expanded to:

_mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 1, 0, 3);

so it does look as if the code is doing the right thing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top