By default, algorithm invocations like this execute on the device backend (i.e., the GPU in your case).
If you're using Thrust 1.7 or better, use the thrust::host
execution policy to force an algorithm invocation to execute on the host (i.e., the CPU):
#include <thrust/execution_policy.h>
...
thrust::reduce(thrust::host, first, last);
...
thrust::transform_reduce(thrust::host,
first,
last,
MyOperation(data),
0,
thrust::plus<unsigned int>());
If you're using Thrust 1.6, you can retarget the invocations to the host by retag
ging an existing iterator:
#include <thrust/iterator/retag.h>
...
thrust::reduce(thrust::retag<thrust::host_system_tag>(first),
thrust::retag<thrust::host_system_tag>(last));
...
thrust::transform_reduce(thrust::retag<thrust::host_system_tag>(first),
thrust::retag<thrust::host_system_tag>(last),
MyOperation(data),
0,
thrust::plus<unsigned int>());
If you're using an older version of Thrust prior to 1.6, you need to pass host_space_tag
to counting_iterator
as a template parameter:
thrust::reduce(thrust::counting_iterator<unsigned int, thrust::host_space_tag>(0),
thrust::counting_iterator<unsigned int, thrust::host_space_tag>(N));