CUDA Double pointer memory copy [duplicate]

Question

In the last line

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

you are trying to copy data from the device to the host (NOTE: I assume that you allocated host memory for the Mtx_on_GPU pointers!)

However, the pointers are stored in device memory, so you can't access the directly from host side. The line should be

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This may become clearer when using "overly elaborate" variable names:

int ** devicePointersStoredInDeviceMemory;
cudaMalloc( (void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N);

int* devicePointersStoredInHostMemory[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE );

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory,
    sizeof(int*)*N, cudaMemcpyHostToDevice);

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory"
// as an argument
...

int* hostPointersStoredInHostMemory[N];
for(int i=0; i<N; i++) {
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!)

    int* devicePointer = devicePointersStoredInHostMemory[i];

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
}

EDIT in response to the comment:

The d_ptr is "an array of pointers". But the memory of this array is allocated with cudaMalloc. That means that it is located on the device. In contrast to that, with int* Mtx_on_GPU[N]; you are "allocating" N pointers in host memory. Instead of specifying the array size, you could also have used malloc. It may become clearer when you compare the following allocations:

int** pointersStoredInDeviceMemory;
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N);

int** pointersStoredInHostMemory;
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*));

// This is not possible, because the array was allocated with cudaMalloc:
int *pointerA = pointersStoredInDeviceMemory[0];

// This is possible because the array was allocated with malloc:    
int *pointerB = pointersStoredInHostMemory[0];

It may be a little bit brain-twisting to keep track of

the type of the memory where the pointers are stored
the type of the memory that the pointers are pointing to

but fortunately, it hardly becomes more than 2 indirections.