Question

I am working with two-dimensional arrays of 16-bit integers defined as

int16_t e[MAX_SIZE*MAX_NODE][MAX_SIZE];
int16_t C[MAX_SIZE][MAX_SIZE];

Where Max_SIZE and MAX_NODE are constant values. I'm not a professional programmer, but somehow with the help of people in StackOverflow I managed to write a piece of code that deploys SSE instruction on my data and achieved a significant speed-up. Currently, I am using the intrinsics that do not require data alignment (mainly _mm_loadu_si128 and _mm_storeu_si128).

for (b=0; b<n; b+=8){
    v1 = _mm_loadu_si128((__m128i*)&C[level][b]); // level defined elsewhere.
    v2 = _mm_loadu_si128((__m128i*)&e1[node][b]); // node defined elsewhere.
    v3 = _mm_and_si128(v1,v2);
    _mm_storeu_si128((__m128i*)&C[level+1][b],v3);
}

When I change the intrinsics to their counterparts for aligned data (i.e. _mm_load_si128 and _mm_store_si128), I get run-time errors, which leads me to the assumption that my data is not aligned properly.

My question is now, if my data is not aligned properly, how can I align it to be able to use the corresponding intrinsics? I'd think since the integers are 16 bits, they're automatically aligned. But I seem to be wrong!

Any insight on this will be highly appreciated.

Thanks!

Était-ce utile?

La solution

SSE needs data to be aligned on 16 bytes boundary, not 16 bits, that's your problem.

What you're looking for to align your static arrays is compiler dependent.

If you're using MSVC, you'll have to use __declspec(align(16)), or with GCC, this would be __attribute__((aligned (16))).

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top