Pergunta

I've noticed that accessing __m128 fields by index is possible in gcc, without using the union trick.

__m128 t;

float r(t[0] + t[1] + t[2] + t[3]);

I can also load a __m128 just like an array:

__m128 t{1.f, 2.f, 3.f, 4.f};

This is all in line with gcc's vector extensions. These, however, may not be available elsewhere. Are the loading and accessing features supported by the intel compiler and msvc?

Foi útil?

Solução 3

To load a __m128, you can write _mm_setr_ps(1.f, 2.f, 3.f, 4.f), which is supported by GCC, ICC, MSVC and clang.

So far as I know, clang and recent versions of GCC support accessing __m128 fields by index. I don't know how to do this in ICC or MSVC. I guess _mm_extract_ps works for all 4 compilers but its return type is insane making it painful to use.

Outras dicas

If you want you code to work on other compilers then don't use those GCC extensions. Use the set/load/store intrinsics. _mm_setr_ps is fine for setting constant values but should not be used in a loop. To access elements I normally store the values to an array first then read the array.

If you have an array a you should read/store it in with

__m128 t = _mm_loadu_ps(a);
_mm_storeu_ps(a, t);

If the array is 16-byte aligned you can use an aligned load/store which is slightly faster on newer systems but much faster on older systems.

__m128 t = _mm_load_ps(a);
_mm_store_ps(a, t);

To get 16-byte aligned memory on the stack use

__declspec(align(16)) const float a[] = ...//MSVC
__attribute__((aligned(16))) const float a[] ...//GCC, ICC

For 16-byte aligned dynamic arrays use:

float *a = (float*)_mm_malloc(sizeof(float)*n, 16); //MSVC, GCC, ICC, MinGW 

You can also use macros for this:

//Test to see if we are using MSVC as MSVC AVX types are slightly different to the GCC ones
#ifdef _MSC_VER 
#define GET_F32_AVX_MULTIPLATTFORM(vector,index) (vector).m256_f32[index]
#define GET_F64_AVX_MULTIPLATTFORM(vector,index) (vector).m256d_f64[index]
#else 
#define GET_F32_AVX_MULTIPLATTFORM(vector,index) (vector)[index]
#define GET_F64_AVX_MULTIPLATTFORM(vector,index) (vector)[index]
#endif
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top