Вопрос

I'm writting app in c which can convert .png image into grayscale. I'm using c and cuda. I have problem with cuda code and I don't know why. (I'm cuda begginer).

My transformation function looks:

__global__
void setPixelToGrayscale(unsigned char *image)
{
    int i = threadIdx.x*4;
    float gray;
    float r, g, b;
    r = image[i + 0];
    g = image[i + 1];
    b = image[i + 2];
    gray = .299f*r + .587f*g + .114f*b;
    image[i + 0] = gray;
    image[i + 1] = gray;
    image[i + 2] = gray;
    image[i + 3] = 255;
}


void transformToGrayCuda(rgb_image *img)
{

    unsigned char* image = img->image;
    unsigned char* image_d;
    unsigned width = img->width;
    unsigned height = img->height;
    int N = (int)width * (int)height; 
    size_t size = N * sizeof(unsigned char);
    cudaMalloc((void **) image_d, size);
    cudaMemcpy(image_d, image,  size, cudaMemcpyHostToDevice);
    setPixelToGrayscale<<<1, N>>>(image_d);
    cudaMemcpy(image, image_d, size, cudaMemcpyDeviceToHost);
    cudaFree(image_d);

/* this works fine if cuda code is commented
int j=0;
for(j=0; j<N; j++)
{
    int i = j*4;
    float gray;
    float r, g, b;
    r = image[i + 0];
    g = image[i + 1];
    b = image[i + 2];
    gray = .299f*r + .587f*g + .114f*b;
    image[i + 0] = gray;
    image[i + 1] = gray;
    image[i + 2] = gray;
    image[i + 3] = 255;
}
*/

}

I've done something wrong in cuda version because, when cuda code is commented and ill run in loop c code everything works fine. Why my cuda code doesn't work properly?

EDIT: it's my test image: https://i.stack.imgur.com/B3yJu.png

it's my result with cuda: https://i.stack.imgur.com/bzmWJ.png

it's my result with only c code: http:// [no space here, i have no rep] i.imgur.com/lU4vIiK.png

that's what i meant when i wrote that my cuda code does not work properly.

Это было полезно?

Решение

The problem in your code is:

cudaMalloc((void **) image_d, size);

You should give a pointer to the pointer, not cast the variable it to. The right code is:

cudaMalloc(&image_d, size);

Другие советы

What is the size of N?. You are running all N threads in a single block. There is a limit of 512 or 1024 threads per block depending upon the GPU. Please change the number of blocks if N > 512. With nummber of blocks = 1 + N/ 512 and threads per block = 512. Here, you need to check in kernel if threadid < N to avoid accessing out-of-bounds memory.

Also, kernel executions are async. So, you need a cudadevicesynchronize() call after kernel invocation.

If you give exact error/ issue you are getting then I can provide more help.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top