Question

I'm trying to every element of an array of 8 floats using SSE intrinsics, just to learn how to use them. However, when I attempt to write it like this:

alignas(16) float Numbers[8] = 
{0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f};    

__m128 Group1 = _mm_load_ps(Numbers);
__m128 Group2 = _mm_load_ps(Numbers + 4*sizeof(float));
__m128 Zero = _mm_setzero_ps();

__m128 Sum1 = _mm_add_ps(Group1, Group2);     // Sum1 = Group1 + Group2
__m128 Sum2 = _mm_hadd_ps(Sum1, Zero);        // Sum2[31:0] = Sum1[31:0] + Sum1[63:32]
                                              // Sum2[63:32] = Sum1[95:64] + Sum1[127:96]
__m128 Sum3 = _mm_hadd_ps(Sum2, Zero);        // Sum3[31:0] = Sum2[31:0] + Sum2[63:32]

float Result;
_mm_store_ss(&Result, Sum3);

Result comes out to be 6, when it should be 28. I've been referring to a reference for these intrinsics, but I've had no avail to figuring out what is wrong with my logic here. Any suggestions?

Était-ce utile?

La solution

Try changing this line

__m128 Group2 = _mm_load_ps(Numbers + 4*sizeof(float));

to

__m128 Group2 = _mm_load_ps(Numbers + 4);

(Numbers is a float[], not a char[])

Autres conseils

@twin has already pointed out the main problem, but I thought I'd just add a couple of further points: (a) you don't need a zero vector and (b) you don't need separate sum vectors - you can do this all in-place, which should be more efficient. Here is the simplified code, which I've tested with gcc:

#include <stdio.h>
#include <pmmintrin.h>

int main()
{
    float Numbers[8] __attribute__ ((aligned(16))) =
        {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f};

    __m128 Group1 = _mm_load_ps(Numbers);
    __m128 Group2 = _mm_load_ps(Numbers + 4);

    __m128 Sum = _mm_add_ps(Group1, Group2);
    Sum = _mm_hadd_ps(Sum, Sum);
    Sum = _mm_hadd_ps(Sum, Sum);

    float Result;
    _mm_store_ss(&Result, Sum);

    printf("Result = %g\n", Result);

    return 0;
}

Test it:

$ gcc -Wall -msse3 sum_ps.c && ./a.out
Result = 28
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top