Here we are copying a two dimensional array of integers from the device to host.
First, create a single dimensional array with size equal to size of another single dimension array (here
blockSizeX
).CUdeviceptr[] hostDevicePointers = new CUdeviceptr[blockSizeX]; for (int i = 0; i < blockSizeX; i++) { hostDevicePointers[i] = new CUdeviceptr(); cuMemAlloc(hostDevicePointers[i], size * Sizeof.INT); }
Allocate device memory for the array of pointers that point to the other array, and copy array pointers from the host to the device.
CUdeviceptr hostDevicePointersArray = new CUdeviceptr(); cuMemAlloc(hostDevicePointersArray, blockSizeX * Sizeof.POINTER); cuMemcpyHtoD(hostDevicePointersArray, Pointer.to(hostDevicePointers), blockSizeX * Sizeof.POINTER);
Launch the kernel.
kernelLauncher.call(........, hostDevicePointersArray);
Transfer the output from the device to host.
int hostOutputData[] = new int[numberofelementsInArray * blockSizeX]; cuMemcpyDtoH(Pointer.to(hostOutputData), hostDevicePointers[i], numberofelementsInArray * blockSizeX * Sizeof.INT); for (int j = 0; j < size; j++) { sum = sum + hostOutputData[j]; }