Auto vectorization not working

Question 1

The error 1305 happens because the optimizer did not vectorize the loop since the value sum is not used. Simply adding printf("%d\n", sum) fixes that. But then you get a new error code 1105 "Loop includes a non-recognized reduction operation". To fix this you need you need to set /fp:fast

The reason is that floating point arithmetic is not associative and reductions using SIMD or MIMD (i.e. using multiple threads) need to be associative. By using a looser floating point model you can do the reduction.

I just tested it with the following code and the default fp:precise does not vectorize and when I use fp:fast it does.

#include <stdio.h>
int main() {
    const int N = 4096;
    float x[N];
    float y[N];
    float sum = 0;
    for (int i = 0; i < N; i++){
        sum += x[i] * y[i];
    }
    printf("sum %f\n", sum);
}

In regards to your question about the loop with the rand() function the rand() function is not a SIMD function. It can't be vectorized. You need to find a SIMD rand() function. I don't know of one. An alternative is pre-compute an array of random numbers and use the array instead. In any case rand() is a horrible random number generate and is only useful for some toy cases. Consider using the Mersenne twister PRNG.

Question 2

One problem could be that your stack allocation isn't necessarily aligned by your compiler. If your compiler supports c++11 you could use:

float x[N] alignas(16);
float y[N] alignas(16);

To explicitly get 16 byte aligned memory, which is required by most SSE operations.

EDIT:

Even if alignment isn't the issue and your compiler is vectorizing unaligned code you should make this optimization as unaligned SSE operations are very slow compared to their aligned counterparts.