If you want you code to work on other compilers then don't use those GCC extensions. Use the set/load/store intrinsics. _mm_setr_ps
is fine for setting constant values but should not be used in a loop. To access elements I normally store the values to an array first then read the array.
If you have an array a
you should read/store it in with
__m128 t = _mm_loadu_ps(a);
_mm_storeu_ps(a, t);
If the array is 16-byte aligned you can use an aligned load/store which is slightly faster on newer systems but much faster on older systems.
__m128 t = _mm_load_ps(a);
_mm_store_ps(a, t);
To get 16-byte aligned memory on the stack use
__declspec(align(16)) const float a[] = ...//MSVC
__attribute__((aligned(16))) const float a[] ...//GCC, ICC
For 16-byte aligned dynamic arrays use:
float *a = (float*)_mm_malloc(sizeof(float)*n, 16); //MSVC, GCC, ICC, MinGW