Question

I was referring to Intel's manual on the Xeon Phi instruction set and wasn't able to understand how the scatter/gather instructions work.

Suppose if I have the following vector of doubles:

A-> |b4|a4|b3|a3|b2|a2|b1|a1|

Is it possible to create 4 vectors as follows:

V1->|b1|a1|b1|a1|b1|a1|b1|a1|
V2->|b2|a2|b2|a2|b2|a2|b2|a2|
V3->|b3|a3|b3|a3|b3|a3|b3|a3|
V4->|b4|a4|b4|a4|b4|a4|b4|a4|

using these instructions? Is there any other way to achieve this?

Was it helpful?

Solution

Got this from the Intel Forums (answered by Evgueni Petrov):

__m512d V1 = (__m512d)_mm512_extload_epi32(&Addr, _MM_UPCONV_EPI32_NONE, _MM_BROADCAST_4X16, _MM_HINT_NONE);

where 'Addr' is the address of the location in memory, from which we loaded the doubles into vector 'A'.

We can do a similar operation for V2,V3,V4, by using &(Addr+2), &(Addr+4) and &(Addr+6) respectively.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top