Question

Hi is part of my project. The key is that I have to sort a array of numbers (say 100~200 16 bits numbers, numbers and bits are fixed before hand). I want to sort this within one block using shared memory of GPU, as part of the requirement the data could not go off-chip in middle.

I have read some radix sorting, bitonic sorting algorithm on GPU. But it looks like they are designed for large amount of numbers. I want to sort this 100~200 numbers very very quickly

I appreciate any idea/help

Was it helpful?

Solution

CUB has routines to do sorting operations within a single block, including examples. CUB also has routines that can run at the device level if you desire, to leverage all of the compute resources for larger problem sizes.

For 100+ numbers to be sorted, you're not likely to be able to write your own code that runs any faster.

If you have a very small group (say < 32) numbers to sort at the block level, you may simply want to write your own code. In that case, the warp vote and shuffle instructions are likely to be useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top