CUDA output variable are always 0 [closed]

https://stackoverflow.com/questions/23188274

06-07-2023
|

문제

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.

Closed 9 years ago.

Improve this question

I have made a serial version for a code to calculate a histogram and I know the algorithm works. The problem is that when I do it in CUDA, the only thing I get back as a results are all 0. I can copy the input array dev_x into the output variable h, and I am able to see the input values of x.

The input data is a list of x and y positions with a corresponding color (int from 1 to 5)

The arguments are the input file name, output file name, cellWidth and cellHeight, where cellWidth and cellHeight is the number of regions the input is divided in. A 1000000 X 1000000 array is divided into 1000 X 1000 regions. I need to calculate the number of occurrences of each color in each region.

해결책

There are at least two gigantic, basic problems in this code, neither of which has anything to do with CUDA:

histSize = sizeof(unsigned int) * xMax/cellWidth * yMax/cellHeight * numColors;

//....

 h = (unsigned int*) malloc(histSize);

//.....

for(i=0; i<histSize; i++)
    h[i]=0; // <-- buffer oveflow

which is probably killing the program before it ever even gets to launch the kernel, and:

cudaMalloc( (void**) &dev_h, histSize );

// .......

cudaMemcpy(dev_h, h, size, cudaMemcpyHostToDevice); // buffer overflow

which would kill the CUDA context if the program ever got that far.

These are elementary mistakes and you haven't detected them because your only usage case is apparently a program which attempts to process a 150Mb input file and emit a large histogram from it, and your only method of detecting errors is looking at a file containing that histogram. That is a completely insane way to develop and debug code. If you had done any of the following:

Hardcoded a trivially small test case you already knew the answers for
Added CUDA API error checking
Run valgrind
Used cuda-memcheck
Used a host debugger
ran nvprof

you probably would have instantly detected the problems (there might well be more but I don't care enough to look for them, that is your job), and this Stack Overflow question wouldn't exist.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow