Question

I apologize in advance for the vagueness of this question.

Background:

I am attempting to write a morphological image processing function in OpenCL. I have a __local buffer which I use to store data for every pixel (each pixel is represented by a work-item, no loop unrolling yet). Also, since I am early in testing, I am only using a single work-group (8x8 pixel image so I can manually validate results).

Problem:

There are occasions when data from one, two, three, or even four pixels must be added into the pixel buffer of another. Since these are adjacent pixel in the same workgroup, I am sure I am causing local memory bank conflicts. That's ok, speed isn't my top priority (yet!). However, these bank conflicts seem to be dropping data and even corrupting data. I've been very careful not to overflow or over run the buffers.

So, my first question is: is it, in fact, possible that the the bank conflicts are causing data corruption and loss? The Opencl spec seems to indicate that the operation should serialize, slowing down the bandwidth - but there is no mention of data loss.

My second question is: Help! - What can I do about this?

Any guidance will be greatly appreciated - thanks!

No correct solution

OTHER TIPS

maybe the nvidia whitepaper Prefix Sum (Scan) with CUDA can bring you on the right track. It is about the all-prefix-sums algorithm, which is a good example of a computation that seems inherently sequential, but for which there is an efficient parallel algorithm.

The all-prefix-sums operation turns lists of numbers [3,4,1,2] into their sums: [0,3,7,8].

I know the paper is about CUDA, but I found that the resulting kernels are very similar as both tchnologies use similar concepts.

I hope, the paper can help you.

Cheers

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top