Question

I want to achieve this:

xmm0[0..63] = mem[0..63]
xmm0[64..127] = 0
xmm1[0..63] = mem[64..127]
xmm1[64..127] = 0

In fact, it doesn't have to be exactly like this. It's okay as long as:

xmm0[0..63] + xmm0[64..127] = mem[0..63]
xmm1[0..63] + xmm1[64..127] = mem[64..127]

How should I do this using xmm intrinsic?

Was it helpful?

Solution

I would simply use the _mm_set_pd or _mm_set1_pd intrinsics and see what your compiler generates - it should be reasonably efficient, and if not then the generated code may give you an idea of how to improve on it with more explicit intrinsics, e.g.:

double d[2];

__m128d v0 = _mm_set_pd(d[0], 0.0);
__m128d v1 = _mm_set_pd(d[1], 0.0);

Alternatively, as pointed out by @Mysticial and @Anycorn, you can just use _mm_load_sd:

double d[2];

__m128d v0 = _mm_load_sd(&d[0]);
__m128d v1 = _mm_load_sd(&d[1]);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top