EDIT: See the comments and the EDIT in the original question! This solution is probably not necessary:
Not (necessarily) an answer, but too long for a comment: I don't see a reason why it should not be possible to simply copy the data from the (raw) pointer to the host with cudaMemcpy
:
float* devPtr = thrust::raw_pointer_cast(thrust_dev_ptr_Cols);
float* hostPtr = (float*)malloc (numbers*sizeof(float));
cudaMemcpy(hostPtr, devPtr, numbers*sizeof(float), cudaMemcpyDeviceToHost);
EDIT: BTW, if devCols
is still known at this point, then you could probably use devCols
directly instead of the devPtr
- this is not obvious from the posted code