Does Intel array notation and elementary functions vectorize well with Xeon Phi ISA?

Question

Here is a corrected version of the example.

__attribute__((vector(linear(sum),linear(a))))
inline void compute_seq(float *sum, float* a) {
  int i;
  *sum = 0.0f;
  for(i=0; i<N; i++)
    *sum += a[i*STRIDE];
}

int main() {
  // Initialize
  float *A = malloc(N*N*sizeof(float));
  float sums[N];
  compute_seq(&sums[:],&A[0:N:N]);
}

The important change is at the call site. The expression &sums[:] creates an array section consisting of &sums[0], &sums[1], &sums[2], ... &sums[N-1]. The expression &A[0:N:N] creates an array section consisting of &A[0*N], &A[1*N], &A[2*N], ...&A[(N-1)*N].

I added two linear clauses to the vector attribute to tell the compiler to generate a clone optimized for the case that the arguments are arithmetic sequences, as they are in this example. For this example, they (and the vector attribute) are redundant since the compiler can see both the callee and call site in the same translation unit and figure out the particulars for itself. But if compute_seq were defined in another translation unit, the attribute might help.

Array notation is a work in progress. icc 14.0 beta compiled my example for Intel(R) Xeon Phi(TM) without complaint. icc 13.0 update 3 reported that it couldn't vectorize the function ("dereference too complex"). Perversely, leaving the vector attribute off shut up the report, probably because the compiler can vectorize it after inlining.

I use the compiler option "-opt-assume-safe-padding" when compiling for Intel(R) Xeon Phi(TM). It may improve vector code quality. It lets the compiler assume that the page beyond any accessed address is safe to touch, thus enabling certain instruction sequences that would otherwise be disallowed.