Question

I have a bunch of packed floats inside an XMM register (using SSE intrinsics):

__m128 xmm = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);

I'd like to convert all of these to integers in one go. I found an intrinsic, that does what I want (_mm_cvtps_pi16()), but it yields 4x16-bit short instead of full-blown int. An intrinsic called _mm_cvtps_pi32() yields int, but only for the two lower values in xmm. I can use it, extract the values, move things around and use it again, but is there a simpler way? Why wouldn't there be a straightforward 32bit packed float -> 32bit integer instruction? Surely both fit in the same space of an XMM register?

EDIT: Okay, I see now that _mm_cvtps_pi32() returns __m64 instead of __m128, which means it operates on a MMX-style MM... register. That would explain why it returns just two ints, but now I'm wondering:

  • Will I have trouble when compiling for x64? Reportedly, __m64 isn't supported there...
  • Why didn't they extend this instruction when SSE rolled out?

Thanks!

Était-ce utile?

La solution

According to this documentation: __m128d _mm_cvtps_epi32(__m128d a) generates a cvtps2dq instruction, which does what you want.

Autres conseils

Use documentation (_mm_cvtps_epi32):

Magic documentation.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top