In the last line
cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
you are trying to copy data from the device to the host (NOTE: I assume that you allocated host memory for the Mtx_on_GPU
pointers!)
However, the pointers are stored in device memory, so you can't access the directly from host side. The line should be
cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
This may become clearer when using "overly elaborate" variable names:
int ** devicePointersStoredInDeviceMemory;
cudaMalloc( (void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N);
int* devicePointersStoredInHostMemory[N];
for(int i=0; i<N; i++)
cudaMalloc( (void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE );
cudaMemcpy(
devicePointersStoredInDeviceMemory,
devicePointersStoredInHostMemory,
sizeof(int*)*N, cudaMemcpyHostToDevice);
// Invoke kernel here, passing "devicePointersStoredInDeviceMemory"
// as an argument
...
int* hostPointersStoredInHostMemory[N];
for(int i=0; i<N; i++) {
int* hostPointer = hostPointersStoredInHostMemory[i];
// (allocate memory for hostPointer here!)
int* devicePointer = devicePointersStoredInHostMemory[i];
cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
}
EDIT in response to the comment:
The d_ptr
is "an array of pointers". But the memory of this array is allocated with cudaMalloc
. That means that it is located on the device. In contrast to that, with int* Mtx_on_GPU[N];
you are "allocating" N pointers in host memory. Instead of specifying the array size, you could also have used malloc
. It may become clearer when you compare the following allocations:
int** pointersStoredInDeviceMemory;
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N);
int** pointersStoredInHostMemory;
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*));
// This is not possible, because the array was allocated with cudaMalloc:
int *pointerA = pointersStoredInDeviceMemory[0];
// This is possible because the array was allocated with malloc:
int *pointerB = pointersStoredInHostMemory[0];
It may be a little bit brain-twisting to keep track of
- the type of the memory where the pointers are stored
- the type of the memory that the pointers are pointing to
but fortunately, it hardly becomes more than 2 indirections.