Question

I am working on this simple code pieces By the way i initialized all elements of c array with 0.

#pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
{
  c[0]=11;
  for(i=0; i<n; i++) {
    if(c[i]==11) c[i]=123;
    c[i] = a[i] + b[i];
  }
}

I saw that c[0] was assigned at the host code pieces(CPU) when i looked the generated codelet. This means for iterations works with old c value(initialized value is 0). Therefore iterations never entered c[i]=123 assignment. I mean the code returned wrong results :( Have you encountered anything like it ever?

Was it helpful?

Solution

According to the OpenACC v1.0 reference, the acc kernels directive surrounds loops to be executed on the accelerator, typically as a sequence of kernel operations. This means that code outside loops does not necessarily have to run in parallel in the accelerator. In your case, it would be better to use the acc parallel directive:

#pragma acc parallel copyin(a[0:n],b[0:n]), copyout(c[0:n])
{
  c[0]=11;
  #pragma acc loop 
  for(i=0; i<n; i++) {
    if(c[i]==11) c[i]=123;
    c[i] = a[i] + b[i];
  }
}

The acc loop directive ensures that the iterations of the loop are distributed across the accelerator threads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top