Question

I have 3 nested loops like this:

      !$omp parallel do schedule(runtime) private(s1)
      DO  k = 0, z
         !$omp simd collapse( 2 ) reduction( +: s1 )
         DO  i = 0, x
            DO  j =  0, z
               s1 = s1 + array(k,j,i)
            ENDDO
         ENDDO
         sums_l(k) = s1
      ENDDO

But the intel compiler complains with "warning #13379: loop was not vectorized with "simd"" Why is that? How would I do that instead?

//EDIT3: This is code, that produces the error. It is reduced to be the minimum that still causes the error. If you remove literally anything, it vectorizes.

SUBROUTINE simdTest

  IMPLICIT NONE

  INTEGER ::  i, j, k, sr, tn,nzb,nzt,nxl,nxr,nys,nyn
  REAL    ::  s1, s2, s3, s4
  REAL, DIMENSION(:,:,:), ALLOCATABLE :: u,v,pt,rmask,sums_l
  REAL, DIMENSION(:,:), ALLOCATABLE :: usws,vsws,shf

  !$omp parallel do schedule(runtime) private(s1,s2,s3)
  DO  k = nzb, nzt+1
    !$omp simd collapse( 2 ) reduction( +: s1, s2, s3 )
    DO  i = nxl, nxr
       DO  j =  nys, nyn
          s1 = s1 + u(k,j,i)  * rmask(j,i,sr)
          s2 = s2 + v(k,j,i)  * rmask(j,i,sr)
          s3 = s3 + pt(k,j,i) * rmask(j,i,sr)
       ENDDO
    ENDDO
    sums_l(k,1,tn) = s1
    sums_l(k,2,tn) = s2
    sums_l(k,4,tn) = s3
  ENDDO

  !$omp parallel do reduction( +: s1, s2, s3, s4) schedule(runtime)
  DO  i = nxl, nxr
   DO  j =  nys, nyn
      s1 = s1 + usws(j,i) * rmask(j,i,sr)
      s2 = s2 + vsws(j,i) * rmask(j,i,sr)
      s3 = s3 + shf(j,i)  * rmask(j,i,sr)
      s4 = s4 + 0.0
   ENDDO
  ENDDO
  sums_l(nzb,12,tn) = s1
  sums_l(nzb,14,tn) = s2
  sums_l(nzb,16,tn) = s3

END SUBROUTINE
Était-ce utile?

La solution

There is no more place for this in the comments:

I get this when I compile it at an Ivy Bridge CPU. The loop on line 15 is not profitable to be vectorized on the CPU, but notice it IS VECTORIZED for the Intel MIC architecture. The loop 16 is vectorized on the CPU also with the target directives removed.

The reason for the vectorization problem is in the first remark "subscript too complex".

ifort -openmp simd.f90 -warn -O3 -c -vec-report=3 -xHOST -fpp 
ifort: command line remark #10382: option '-xHOST' setting '-xCORE-AVX-I'
simd.f90(17): (col. 33) remark: loop was not vectorized: subscript too complex
simd.f90(15): (col. 5) warning #13379: loop was not vectorized with "simd"
simd.f90(16): (col. 8) remark: LOOP WAS VECTORIZED
simd.f90(13): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(13): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(31): (col. 4) remark: LOOP WAS VECTORIZED
simd.f90(30): (col. 3) remark: loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: BLOCK WAS VECTORIZED
ifort: warning #10362: Environment configuration problem encountered.  Please check for proper MPSS installation and environment setup.
simd.f90(15): (col. 5) remark: *MIC* OpenMP SIMD LOOP WAS VECTORIZED
simd.f90(13): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(13): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(31): (col. 4) remark: *MIC* LOOP WAS VECTORIZED
simd.f90(31): (col. 4) remark: *MIC* PEEL LOOP WAS VECTORIZED
simd.f90(31): (col. 4) remark: *MIC* REMAINDER LOOP WAS VECTORIZED
simd.f90(30): (col. 3) remark: *MIC* loop was not vectorized: not inner loop
simd.f90(29): (col. 7) remark: *MIC* loop was not vectorized: not inner loop
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top