Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

StackOverflow https://stackoverflow.com/questions/18717775

  •  28-06-2022
  •  | 
  •  

質問

Does the following code incur the AVX-to-SSE transition penalty? If so, how can I store the sum of a __m256 vector without incurring this penalty?

__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x);  // now all fields of x contain the sum

float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));

Thank you.

役に立ちましたか?

解決

So long as you compile your code with -mavx then you shouldn't see any AVX-SSE transition penalties. When compiling with -mavx you automatically use the newer non-destructive SSE opcodes and there are no penalties when mixing these with AVX instructions. The penalties are only incurred when you mix legacy SSE instructions with AVX, and typically this only happens with assembly code or when mixing modules which have been compiled with different flags.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top