Question

I am wondering if it is possible to use SSE (1,2,3,4,...) to optimise the following loop:

// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
    for (int k = 1; k < size-1; ++k)
    {
        v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1] 
                       + u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
    }
}

The [j*size + k] idiom is used to treat the block of memory as if it were a multi-dimensional array.

Sadly the -ftree-vectorize flag for GCC (4.5) does not believe that the loop is amenable to SIMD-type optimisation. (Although saying that I've never seen -ftree-vectorize optimise anything but the most trivial of loops.)

While I am aware that there are many other ways to improve the performance of the loop (OpenMP, unrolling, in-place algorithms, etc) I am specifically interested to know if SIMD can be used. I am perhaps more interested in the general outline of how (if at all) such a loop could be transformed, as opposed to a concrete implementation.

Was it helpful?

Solution

It looks like it should be possible, but since (a) you're using doubles, (b) you're doing very little computation relative to I/O, (c) most modern x86-64 CPUs have two FPUs anyway, then you may not get much return on your SIMD coding investment.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top