문제

Hi is part of my project. The key is that I have to sort a array of numbers (say 100~200 16 bits numbers, numbers and bits are fixed before hand). I want to sort this within one block using shared memory of GPU, as part of the requirement the data could not go off-chip in middle.

I have read some radix sorting, bitonic sorting algorithm on GPU. But it looks like they are designed for large amount of numbers. I want to sort this 100~200 numbers very very quickly

I appreciate any idea/help

도움이 되었습니까?

해결책

CUB has routines to do sorting operations within a single block, including examples. CUB also has routines that can run at the device level if you desire, to leverage all of the compute resources for larger problem sizes.

For 100+ numbers to be sorted, you're not likely to be able to write your own code that runs any faster.

If you have a very small group (say < 32) numbers to sort at the block level, you may simply want to write your own code. In that case, the warp vote and shuffle instructions are likely to be useful.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top