So I think there's something else going on here. Have you tried running the original sample on which your code is based? This is available on CodePlex.
Load the samples solution and build the Reduction project in Release mode and then run it without the debugger attached. You should see some output like this.
Running kernels with 16777216 elements, 65536 KB of data ...
Tile size: 512
Tile count: 128
Using device : NVIDIA GeForce GTX 570
Total : Calc
SUCCESS: Overhead 0.03 : 0.00 (ms)
SUCCESS: CPU sequential 9.48 : 9.45 (ms)
SUCCESS: CPU parallel 5.92 : 5.89 (ms)
SUCCESS: C++ AMP simple model 25.34 : 3.19 (ms)
SUCCESS: C++ AMP simple model using array_view 62.09 : 20.61 (ms)
SUCCESS: C++ AMP simple model optimized 25.24 : 1.81 (ms)
SUCCESS: C++ AMP tiled model 29.70 : 7.27 (ms)
SUCCESS: C++ AMP tiled model & shared memory 30.40 : 7.56 (ms)
SUCCESS: C++ AMP tiled model & minimized divergence 25.21 : 5.77 (ms)
SUCCESS: C++ AMP tiled model & no bank conflicts 25.52 : 3.92 (ms)
SUCCESS: C++ AMP tiled model & reduced stalled threads 21.25 : 2.03 (ms)
SUCCESS: C++ AMP tiled model & unrolling 22.94 : 1.55 (ms)
SUCCESS: C++ AMP cascading reduction 20.17 : 0.92 (ms)
SUCCESS: C++ AMP cascading reduction & unrolling 24.01 : 1.20 (ms)
Note that none of the examples are taking anywhere near the time you code is. Although it's fair to say that the CPU is faster and data copy time is a big contributing factor here.
This is to be expected. Effective use of a GPU involves moving more than operations like reduction to the GPU. You need to move significant amount of compute to make up for the copy overhead.
Some things you should consider:
- What happens with you run the sample from CodePlex?
- Are you running a release build with optimization enabled?
- Are you sure running are running against the actual GPU hardware and not against a WARP (software emulator) accelerator?
Some more information that would be helpful
- what hardware are you using?
- How large is your data set, both the input data and the size of the partial result array?