Question

First of all I just want to say that I'm a newbie in OpenCL and I don't have a strong background in computer science since is not what I studied.

So, I'm writing a tool that calculates horizon lines given a digital terrain model (DTM). To do that i use OpenCL in it's task parallel approach since data parallelization is either not possible or i couldn't find an approach.

I have 8 kernels each one calculating a portion of a 360 degrees horizon (or panorama or whatever you wanna call it). The maths behind that are super simple. Just traces a line from a pixel in a certain direction and looks for the heights elevation. Then, repeat that for all pixels in 360 directions.

The point is that I succeed doing it but I found out one thing. If I use smaller DTM, it looks like I get correct results but if I use a very large DTM then it doesn't even get inside the kernels.

The big question is: is there any reason why this is happening? Is it possible to send 3-4 gb of data to the GPU? Am I just neglecting some basic stuff such there's no way to have 4gb of global data? I'm sending the data as pointers to the kernel so don't what is wrong.

Thanks!!

UPDATE:

The error was indeed that i wasn't checking all the steps. I got an CL_MEM_OBJECT_ALLOCATION_FAILURE error so i guess i need to downsize somehow the size of my memory objects. Thanks to everyone!

Was it helpful?

Solution

First, you really need to check the result of the OpenCL API calls. If "it doesn't even gets inside the kernels", then one of the API call has returned some error value that you've missed.

Second, there is indeed a limit for the buffer sizes (and for many other values). OpenCL mandates the minimum limit for each value, but other than that it's device-dependent and you need to ask your specific device for its maximums, using clGetDeviceInfo.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top