Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

https://stackoverflow.com/questions/18717775

simd
avx

28-06-2022
|

質問

Does the following code incur the AVX-to-SSE transition penalty? If so, how can I store the sum of a __m256 vector without incurring this penalty?

__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x);  // now all fields of x contain the sum

float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));

Thank you.

解決

So long as you compile your code with -mavx then you shouldn't see any AVX-SSE transition penalties. When compiling with -mavx you automatically use the newer non-destructive SSE opcodes and there are no penalties when mixing these with AVX instructions. The penalties are only incurred when you mix legacy SSE instructions with AVX, and typically this only happens with assembly code or when mixing modules which have been compiled with different flags.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow