سؤال

I have a CUDA - related question for you :). Since I am relatively new to using CUDA I would like to know if this "performance" is ok, or not.

I am using C# and Cudafy.Net!

I have a grayscale image (represented as float[]) that I calculated from a screenshot (the size of the image is: 1920x1018 pixel).

Now I use a Sobel filter running on the GPU (through Cudafy.Net) which looks like this:

    [Cudafy]
    public static void PenaltyKernel(GThread thread, Single[] data, Single[] res, Int32 width, Int32 height)
    {
        Single[] shared_data = thread.AllocateShared<Single>("shared_data", BLOCK_WIDTH * BLOCK_WIDTH);
        ///Map from threadIdx/BlockIdx to Pixel Position
        int x = thread.threadIdx.x - FILTER_WIDTH + thread.blockIdx.x * TILE_WIDTH;
        int y = thread.threadIdx.y - FILTER_WIDTH + thread.blockIdx.y * TILE_WIDTH;
        shared_data[thread.threadIdx.x + thread.threadIdx.y * BLOCK_WIDTH] = data[x + y * width];
        thread.SyncThreads();

        if (thread.threadIdx.x >= FILTER_WIDTH && thread.threadIdx.x < (BLOCK_WIDTH - FILTER_WIDTH) &&
            thread.threadIdx.y >= FILTER_WIDTH && thread.threadIdx.y < (BLOCK_WIDTH - FILTER_WIDTH))
        {
            ///Horizontal Filtering (detects horizontal Edges)
            Single diffHorizontal = 0;
            int idx = GetIndex(thread.threadIdx.x - 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffHorizontal -= shared_data[idx];
            idx++;
            diffHorizontal -= 2 * shared_data[idx];
            idx++;
            diffHorizontal -= shared_data[idx];
            idx += 2*BLOCK_WIDTH;
            diffHorizontal += shared_data[idx];
            idx++;
            diffHorizontal += 2 * shared_data[idx];
            idx++;
            diffHorizontal += shared_data[idx];

            ///Vertical Filtering (detects vertical Edges)
            Single diffVertical = 0;
            idx = GetIndex(thread.threadIdx.x - 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffVertical -= shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical -= 2 * shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical -= shared_data[idx];
            idx = GetIndex(thread.threadIdx.x + 1, thread.threadIdx.y - 1, BLOCK_WIDTH);
            diffVertical += shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical += 2 * shared_data[idx];
            idx += BLOCK_WIDTH;
            diffVertical += shared_data[idx];

            ///Convert the "edgyness" for the Pixel and cut off at 1.0
            Single diff = GMath.Min(1.0f, GMath.Sqrt(diffHorizontal * diffHorizontal + diffVertical * diffVertical));

            ///Get the Array-Index
            idx = GetIndex(x, y, width);
            ///Set the Value
            res[x + y * width] = diff;
        }
    }

Constant values Set before runtime:

TILE_WIDTH = 16;
FILTER_WIDTH = 1;
BLOCK_WIDTH = TILE_WIDTH + 2 * FILTER_WIDTH;

When I run this "PenaltyKernel" function, including the memory allocation for the arrays, copying of data to and from device, I come to an average of about 6.2ms runtime (using a GTX 680 GT!).

So my question now is, if this speed is ok (that would make about 161frames per second) or if I am missing something? Is my Sobel filter even ok (I mean, the result looks good :) )?

Any help is appreciated!

هل كانت مفيدة؟

المحلول

I think that's an OK speed. A great deal of the time is from copying data between host and device (especially transfer from GPU to CPU is slow).

A note on speed: In general, image processing on a GPU may be slower than on a CPU (I haven't tested your code so I don't know if this is true in your case) if the image is small. However, the larger the image, the faster it is to process on device than on host.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top