It seems what you want to achieve depends on thrust::zip_iterator
. You could either
- only replace
thrust::sort_by_key
bycub::DeviceRadixSort::SortPairs
and keepthrust::gather
, or - zip
values{1,2,3}
into array of structures before usingcub::DeviceRadixSort::SortPairs
update
After reading the implementation of thrust::gather
,
$CUDA_HOME/include/thrust/system/detail/generic/gather.inl
you can see it is only a naive kernel like
__global__ gather(int* index, float* in, float* out, int len) {
int i=...;
if (i<len) { out[i] = in[index[i]]; }
}
Then I think your code above can be replaced by a single kernel without too much effort.
In this kernel, you could first use the CUB block-wize primitive cub::BlockRadixSort<...>::SortBlockedToStriped
to get the sorted indices stored in registers and then perform a naive re-order copy as thrust::gather
to fill values{1,2,3}Out
.
Using SortBlockedToStriped
rather than Sort
can do coalesced writing (not for reading though) when copying the values
.