Question

I'm writing an algorithm in OpenCL in which I'd need every work unit to remember a fair portion of data, say something between a long[70] and a long[200] or so per kernel.

Recent AMD devices have 32 KiB __local memory, which is (for the given amount of data per kernel) enough to store the info for 20-58 work units. However, from what I understand from the architecture (and especially from this drawing), each shader core also has a dedicated amount of private memory. I however fail to find its size.

Can anyone tell me how to find out how much private memory each kernel has?

I'm particularly curious about the HD7970, since I plan to buy some of these soon.

Edit: Problem solved, the answer is here in appendix D.

Was it helpful?

Solution

The answer was given by user talonmies in the comments, so I'll write it in a new answer here to close the question.

These values can be found in Appendix D of the AMD APP OpenCL Programming Guide http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf (a similar document exists for nVidia). Apparently a register is 128 bits (4x32) for AMD devices and there are 16384 registers for all modern high-end devices, so that's a remarkable 256KB per compute unit.

OTHER TIPS

I think you are looking for __local memory. That is what 32KB of local data storage is referring to. I don't think you can poll the device to get the private memory amount.

You can pass in a NULL long* cl_mem reference to allocate the memory. I think it is best to use a static amount of memory per WI. Assuming that long[200] will be required for each work item, you would use the code below. It would also be a good idea to divide the work into groups that have the same (or similar) memory requirements, in order to get the most out of the LDS memory.

void __kernel(__local long* localMem, const int localMemPerItem
       //more args...
       )
{
  //host has 'passed' localMemPerItem*get_local_size() long values in as locamMem
  //this work item has access to all of it, but can choose to restrict
  //itself to only the portion it needs.
  //work group size will be limited to CL_DEVICE_LOCAL_MEM_SIZE/(8*localMemPerItem)
  int startIndex=localMemPerItem*get_local_id(0);
  //use localMem[startIndex+ ... ]
}

To answer how large is register file in a 79xx series card, since its based on GCN architecture it is 64KB as per the image in anandtech : http://www.anandtech.com/print/5261

To answer your question how to find out how much memory each kernel uses.. you can look run AMD APP Profiler on your kernel, it tell you in the kernel occupancy section how much space is utilized by the kernel.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top