Nested data environment with different subparts of the same array

StackOverflow https://stackoverflow.com/questions/21159931

  •  28-09-2022
  •  | 
  •  

سؤال

Here is my question about openacc. I read the APIs (v1 and v2), and the behavior of nested data environment with different subparts of the same array is unclear to me.

Code example:

#pragma acc data pcopyin(a[0:20])
{
  #pragma acc data pcopyin(a[100:20])
  {
    #pragma acc parallel loop
    for(i=0; i<20; i++)
      a[i] = i;
      a[i+100] = i;
  }
}

My understanding is that this should work (or at leaste the two acc data parts):

  • The first pragma checks if a[0,20] is on the accelerator
  • NO -> data are allocated on the device and transferred
  • The second pragma checks if a[100,120] is on the accelerator
  • The pointer a is on the accelerator, but not the data from a[100,120]
  • The data are allocated on the device and transferred

I tried this kind of thing with CAPS compiler (v3.3.0 which is the only available right now on my test machine), and the second pragma acc data returns me an error (my second subarray don't have the correct shape). So what happens with my test (I suppose) is that the pointer "a" was found on the accelerator, but the shape associated with it ([0:20]) is not the same in my second pragma ([100:20]).

Is this the normal behavior planned in the API, or should my example work?

Moreover, if this is supposed to work, is there some sort of coherence between the subparts of the same array (somehow, they will be positionned like on the host and I will be able to put a[i] += a[100+i] in my kernel)?

هل كانت مفيدة؟

المحلول

The present test will be looking if "a" is on the device. Hence, when the second data region is encountered, "a" is already on the device but only partially. Instead, a better method would be to add a pointer to point into "a" and reference this pointer on the device. Something like:

#include <stdio.h>

int main () {

   int a[200];
   int *b;
   int i;
   for(i=0; i<200; i++) a[i] = 0;
   b=a+100;

#pragma acc data pcopy(a[0:20])
{
  #pragma acc data pcopy(b[0:20])
  {
    #pragma acc parallel loop
    for(i=0; i<20; i++) {
      a[i] = i;
      b[i] = i;
    }
  }
}
   for(i=0; i<22; i++) printf("%d = %d \n", i, a[i]);
   for(i=100; i<122; i++) printf("%d = %d \n", i, a[i]);
  return 0;
 }

If you had just copied "a[100:20]", then accessing outside this range would be considered a programmer error.

Hope this helps, Mat

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top