Question

I ask this question to be clear on the behaviour of these two clauses when in nested data environment.

When I first read the openACC API, I thought that if I had the following code:

#pragma acc create(a[0:20])
{
  #pragma acc pcopyin(a[0:20])
  {
    ...
  }
} 

the first clause allocated the necessary memory on the accelerator, and then the pcopyin clause copied the data from the host to the accelerator (without allocating).

Now reading the draft for the v2 of th API, what I understand is that the second pcopyin clause do absolutely nothing because the data is already allocated on the accelerator, and since the data are already present on the accelerator, no allocation nor transfer should take place. Is that right?

When I tested the CAPS compiler with this sort of example, I think I obtained the behaviour I expected. Was that right due to some ambiguity in the API? With v2, if I want to do this sort of stuff, should I replace my copyin clause by an update?

Was it helpful?

Solution

I think that you have interpreted the v1.0 Spec wrong (it's easy to read it wrong, you're not alone). "present_or_something" does the "something" only if the list of variables are not already present.

So in your case, "#pragma acc pcopyin(a[0:20])" should not do anything (because of the "create", which is what happens when I check the behavior here with CAPS Compiler 3.3.2

Here is the example I have written to check the behavior (please change present_or_copyin(a[0:20] into copyin(a[0:20] to see that the behavior is different: I modify the array "a" between the "present" and the "present_or_copyin", so present_or_copyin or "copyin" lead to a different result):

#include <stdio.h>

int main(void) {

    int a[20], b[20], i;

    for (i = 0; i < 20; i++) {
        a[i] = 42;
    }


    #pragma acc data, create(a[0:20]) copyout(b[0:20])
    {
    // will upload the array of 42
        #pragma acc data copyin(a[0:20])
        {
            // executed on the host, not seen on GPU
            for (i = 0; i < 20; i++) {
                a[i] = 666;
            }
            // has no effect: already present
            #pragma acc data present_or_copyin(a[0:20])
            {
                #pragma acc kernels, private(i)
                #pragma acc loop independent
                for (i = 0; i < 20; i++) {
                    a[i] += i;
                }
                #pragma acc kernels, private(i)
                #pragma acc loop independent
                for (i = 0; i < 20; i++) {
                    b[i] = a[i];
                }

            }
        }
    }
    for (i = 0; i < 20; i++) {
        printf("%d, ", b[i]);
    }
    printf("\n");

}

With "present_or_copyin" this sample program writes:

42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,

With "copyin" this sample program writes:

666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685,

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top