Question

Usually I work with 3D vectors using following types:

typedef vec3_t float[3];

initializing vectors using smth. like:

vec3_t x_basis = {1.0, 0.0, 0.0};
vec3_t y_basis = {0.0, 1.0, 0.0};
vec3_t z_basis = {0.0, 0.0, 1.0};

and accessing them using smth. like:

x_basis[X] * y_basis[X] + ...

Now I need a vector arithmetics using SSE instructions. I have following code:

typedef float v4sf __attribute__ ((mode(V4SF)))
int main(void)
{
    v4sf   a,b,c;
    a = (v4sf){0.1f,0.2f,0.3f,0.4f};
    b = (v4sf){0.1f,0.2f,0.3f,0.4f};
    c = (v4sf){0.1f,0.2f,0.3f,0.4f};
    a = b + c;
    printf("a=%f \n", a);
    return 0;
}

GCC supports such way. But... First, it gives me 0.00000 as result. Second, I cannot access the elements of such vectors. My question is: how can I access elements of such vectors? I need smth. like a[0] to access X element, a[1] to access Y element, etc.

PS: I compile this code using:

gcc -msse testgcc.c -o testgcc
Was it helpful?

Solution

The safe and recommended way to access the elements is with a union, instead of pointer type punning, which fools the aliasing detection mechanisms of the compiler and may lead to unstable code.

union Vec4 {
    v4sf v;
    float e[4];
};

Vec4 vec;
vec.v = (v4sf){0.1f,0.2f,0.3f,0.4f};
printf("%f %f %f %f\n", vec.e[0], vec.e[1], vec.e[2], vec.e[3]);

OTHER TIPS

Note that gcc 4.6 now supports subscripted vectors:

In C vectors can be subscripted as if the vector were an array with the same number of elements and base type. Out of bound accesses invoke undefined behavior at runtime. Warnings for out of bound accesses for vector subscription can be enabled with -Warray-bounds.

You are forgetting that you need to reinterpret a as array of floats. Following code works properly:

int main(){
    v4sf a,b,c;
    a = (v4sf){0.1f,0.2f,0.3f,0.4f};
    b = (v4sf){0.1f,0.2f,0.3f,0.4f};
    c = (v4sf){0.1f,0.2f,0.3f,0.4f};
    a = b + c;
    float* pA = (float*) &a;
    printf("a=[%f %f %f %f]\n",pA[0], pA[1], pA[2], pA[3]);
    return 0;
}

P.S.: thanks for this question, I didn't know that gcc has such SSE support.

UPDATE: This solution fails once arrays got unaligned. Solution provided by @drhirsh is free from this problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top