Question

I wish to broadcast load an __m256 vector with the same 4 pairs of floats that are adjacent in memory.

That is, if I have some pointer to a float array x given by {a, b}, I wish to end up with an __mm256 containing {a, b, a, b, a, b, a, b}.

My question is, are there any potential problems using _mm256_broadcast_sd to achieve this after casting x to be a double array?

So:

__m256 vect = (__m256)_mm256_broadcast_sd((double *)x);
Was it helpful?

Solution

Yes, you can do this safely; I have done this in the past. In my case, I was doing math using complex numbers where each component was held using a float. Using mm256_broadcast_sd() can be used to insert a single complex number into each of the 4 positions in the resulting __m256d instance, which you can then cast to a __m256 if you want to do float operations on it.

OTHER TIPS

That will work just fine.

There is a small detail regarding 8-byte alignment if alignment check is enabled, however everyone runs with alignment check off, so this is not an issue in practice.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top