Your data is not guaranteed to be 16 byte aligned as required by SSE loads. Either use _mm_loadu_pd
:
a = _mm_loadu_pd(A);
...
a = _mm_loadu_pd(A2ptr);
b = _mm_loadu_pd(B2ptr);
or make sure that your data is correctly aligned where possible, e.g. for static or locals:
alignas(16) double A2[2], B2[2], C[2]; // C++11, or C11 with <stdalign.h>
or without C++11, using compiler-specific language extensions:
__attribute__ ((aligned(16))) double A2[2], B2[2], C[2]; // gcc/clang/ICC/et al
__declspec (align(16)) double A2[2], B2[2], C[2]; // MSVC
You could use #ifdef
to #define
an ALIGN(x)
macro that works on the target compiler.