Use intrinsics if you value your sanity (and free time):
int32_t *dest;
__m128 vf = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
__m128i vi = _mm_cvttps_epi32(vf); // 4 x float -> 4 x int (with truncation)
_mm_store_epi32(dest, vi); // NB: use _mm_storeu_epi32 if `dest` not aligned
If you must use asm for some reason the the corresponding instruction for _mm_cvttps_epi32
is cvttps2dq
.