This is very much bound by the memory controller of the video card, and multiple copies of the same data won't help you. I am unaware of a gpu having more than one memory controller for global access.
Your access pattern of the memory will greatly effect the overall throughput of your kernel. Do you have a specific example/kernel that you need optimized?