CUB has routines to do sorting operations within a single block, including examples. CUB also has routines that can run at the device level if you desire, to leverage all of the compute resources for larger problem sizes.
For 100+ numbers to be sorted, you're not likely to be able to write your own code that runs any faster.
If you have a very small group (say < 32) numbers to sort at the block level, you may simply want to write your own code. In that case, the warp vote and shuffle instructions are likely to be useful.