CUDA NPP - Error on printing output

https://stackoverflow.com/questions/22571312

19-06-2023
|

Frage

Following my previous post here: CUDA NPP - unknown error upon GPU error check

I have tried to sum all the pixels in the image by using the CUDA NPP library, and with the help of some developers, I finally got my code to compile. However, when I try and print out the value which is stored in partialSum by copying it into a double variable (consistent with the NPP guide for CUDA v4.2), I get this error:

Unhandled exception at 0x00fdf7f4 in MedianFilter.exe: 0xC0000005: Access violation reading location 0x40000000.

I've been trying to get rid of it but, I have been unsuccessful so far. Please help! I have at this small piece of code for about two days now.

Code:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
    if (code != cudaSuccess) 
    {
        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) getchar();
    }
}

// processing image starts here 

// device_pointer initializations
unsigned char *device_input;
unsigned char *device_output;    

size_t d_ipimgSize = input.step * input.rows;
size_t d_opimgSize = output.step * output.rows;

gpuErrchk( cudaMalloc( (void**) &device_input, d_ipimgSize) );
gpuErrchk( cudaMalloc( (void**) &device_output, d_opimgSize) );

gpuErrchk( cudaMemcpy(device_input, input.data, d_ipimgSize, cudaMemcpyHostToDevice) );


// Median filter the input image here
// .......


// allocate data on the host for comparing the sum of all pixels in image with CUDA implementation

// 1st argument - allocate data for pSrc - copy device_output into this pointer
Npp8u *odata; 
gpuErrchk( cudaMalloc( (void**) &odata, sizeof(Npp8u)*output.rows*output.cols ) );
gpuErrchk( cudaMemcpy(odata, device_output, sizeof(Npp8u)*output.rows*output.cols, cudaMemcpyDeviceToDevice) ); 

// 2nd arg - set step 
int ostep = output.step;  

// 3rd arg - set nppiSize
NppiSize imSize; 
imSize.width = output.cols; 
imSize.height = output.rows;

// 4th arg - set npp8u scratch buffer size
Npp8u *scratch; 
int bytes = 0;
nppiReductionGetBufferHostSize_8u_C1R( imSize, &bytes);

gpuErrchk( cudaMalloc((void **)&scratch, bytes) );

// 5th arg - set npp64f partialSum (64 bit double will be the result)
Npp64f *partialSum; 
gpuErrchk( cudaMalloc( (void**) &partialSum, sizeof(Npp64f) ) );

//                 nnp8u, int, nppisize, npp8u, npp64f    
nppiSum_8u_C1R( odata, ostep, imSize, scratch, partialSum );

double *dev_result;
    dev_result = (double*)malloc(sizeof(double)); // EDIT
gpuErrchk( cudaMemcpy(&dev_result, partialSum, sizeof(double), cudaMemcpyDeviceToHost) );
//int tot = output.rows * output.cols;
printf( "\n Total Sum cuda %f \n",  *dev_result) ;   // <---- access violation here

Lösung

The problem here seems to be basic pointer misuse (I say seems because we have incomplete, uncompilable code, so it is hard to say for certain).

This should work:

double *dev_result = (double*)malloc(sizeof(double));
gpuErrchk( cudaMemcpy(dev_result, partialSum, sizeof(double), cudaMemcpyDeviceToHost) );
printf( "\n Total Sum cuda %f \n",  *dev_result);

this should also work:

double dev_result;
gpuErrchk( cudaMemcpy(&dev_result, partialSum, sizeof(double), cudaMemcpyDeviceToHost) );
printf( "\n Total Sum cuda %f \n",  dev_result);

This assumes that everything else in the incomplete code is correct. I leave it as an exercise to the reader to spot the differences between the three variants.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow