Question

I have a pretty small dataset, but large enough that it won't fit in the workspace or private memories in any GPU currently on the market. What this means is that each kernel must access the data in the global memory on the GPU. If I replicate this data to multiple copies in the global memory, can it increase performance/reduce latency, or is the memory controller restrictive and will only allow one core to access the global memory at a time? If this is device specific, are there any models which have this feature?

Was it helpful?

Solution

This is very much bound by the memory controller of the video card, and multiple copies of the same data won't help you. I am unaware of a gpu having more than one memory controller for global access.

Your access pattern of the memory will greatly effect the overall throughput of your kernel. Do you have a specific example/kernel that you need optimized?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top