Question

So I have been having trouble with this toy example for learning to program with SSE intrinsics. I read on other threads here that sometimes segmentation faults with the _mm_load_ps function are caused by not aligning things right but I think it should be solved by the attribute((aligned(16))) thing that I did. Also, when I comment out either line 23 or 24 (or both) in my code the problem goes away but obviously this makes the code not work.

#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, *m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, *m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        m_result++;
        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, *m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

Line 23 is arr1 = _mm_load_ps(temp1+4). It's weird to me that I can do one or the other but not both. Any help would be appreciated, thanks!

Était-ce utile?

La solution

Your problem is that you declare a pointer __m128 *m_result but you never allocate any space for it. Later you also do m_result++ which points to another memory address which has not been allocate. There is no reason to use a pointer here.

#include <xmmintrin.h>                 // SSE
#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

Autres conseils

(1) m_result is just a wild pointer:

     __m128 m, *m_result;

Change all occurrences of *m_result to m_result and get rid of the m_result++;. (m_result is just a temporary vector variable that you are subsequently storing to temp3).

(2) Your two stores are potentially misaligned, since temp3 has no guaranteed alignment - either change:

    float temp3[8];

to:

    float temp3[8] __attribute__((__aligned__(16)));

or use _mm_storeu_ps:

    _mm_storeu_ps(temp3, m_result); 
            ^^^
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top