It could be possible in some conditions - check if your CPU supports "DCA" (Direct Cache Access), and if you can activate this feature. This might be useful: https://www.myricom.com/software/myri10ge/790-how-do-i-enable-intel-direct-cache-access-dca-with-the-linux-myri10ge-driver.html
I don't think you really need this though, going over the entire array sequentially should be very efficient as it would be easily recognized by the CPU as a sequential stream and trigger the HW prefetcher. Since it's IvyBridge, even linear page crossing should be fast since it can prefetch across to the next physical page. There may be a little optimization in accessing several pages in parallel (also in terms of TLB miss latencies), but eventually it all boils down to the question - can you saturate your memory Bandwidth. A single core would probably run into a bottleneck in the core/L3 boundary, so the optimal way would be distributing the work by running a HW thread on each core, each to a different segment (the size could be one 4k page per iteration, but larger chunks would also enjoy the benefit of pagemap locality in each core)
However, you may have a bigger problem than accessing the data, and that's to convince the L3 to keep it there. IvyBridge is said to use a dynamic replacement policy in the L3, meaning that it's going to ask itself - who's using all this data, and since you're just preloading it once, the answer would probably be "no one". At that point, the L3 may decide to avoid caching that array altogether, or write newer blocks over older ones.
The exact behavior depends on the actual implementation that wasn't published, but to "trick" it, I believe you'd have to access each data line more than once before it gets' thrown away. Note that just accessing it twice in a row won't help since it's already in the upper caches, you'll have to access it at some distance - not too little as to access the L3 again, but not too large to avoid it getting thrown away. Some experimentation would be required of course to fine tune this.
EDIT:
Here's a blog post covering IvyBridges' L3 replacement policy that you should worry about -
http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/
The actual process of course should behave nicely since it should get caught as utilizing the L3 caching benefits, it's just the preload phase that might give you trouble. If the processing is relatively long then the initial cold misses may not be worth the effort of preloading - beware of premature optimization.