Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

StackOverflow https://stackoverflow.com/questions/18717775

  •  28-06-2022
  •  | 
  •  

Question

Does the following code incur the AVX-to-SSE transition penalty? If so, how can I store the sum of a __m256 vector without incurring this penalty?

__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x);  // now all fields of x contain the sum

float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));

Thank you.

Était-ce utile?

La solution

So long as you compile your code with -mavx then you shouldn't see any AVX-SSE transition penalties. When compiling with -mavx you automatically use the newer non-destructive SSE opcodes and there are no penalties when mixing these with AVX instructions. The penalties are only incurred when you mix legacy SSE instructions with AVX, and typically this only happens with assembly code or when mixing modules which have been compiled with different flags.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top