Autovectorization alignment

https://stackoverflow.com/questions/18545920

26-06-2022
|

Question

From Intel's Compiler Autovectorization Guide there's an example related to alignment that I don't understand. The code is

double a[N], b[N];
...
for(i = 0; i < N; i++)
  a[i+1] = b[i] * 3;

And it says

If the first element of both arrays is aligned at a 16-byte boundary, then either an unaligned load of elements from b or an unaligned store of elements into a, has to be used after vectorization. However, the programmer can enforce the alignment shown below, which will result in two aligned access patterns after vectorization (assuming an 8-byte size for doubles)

_declspec(align(16, 8)) double a[N];
_declspec(align(16, 0)) double b[N];

How to see where the misalignment comes after vectorization? Wouldn't the alignment depend on the size of the arrays?

La solution

Hans Passant essentially covers all the right ideas, but let me explain a bit more:

Say a and b are both aligned to 16 bytes. say, they have address 0x100 and 0x200, for the sake of example.

Now, let's see how the code looks like with i=3 (odd) and i=6 (even)...

a[i+1] = b[i] * 3; will do [0x120] = [0x318] * 3 (i=3, sizeof double is 8)

a[i+1] = b[i] * 3; will do [0x138] = [0x330] * 3

In both cases, either the left hand side or the right hand side is aligned, while the other one is misaligned (aligned accesses would always end in 0 in hex, misaligned something else).

Now... Let's purposefully misalign a to a 8 modulo 16 address (say to 0x108, to keep our example).

Let's see how the code looks like with i=3 (odd) and i=6 (even)...

a[i+1] = b[i] * 3; will do [0x128] = [0x318] * 3 (i=3, sizeof double is 8)

a[i+1] = b[i] * 3; will do [0x140] = [0x330] * 3

both keep the actual accesses aligned and misaligned at the same time.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow