I am not sure what "reduction functions" you are referring to.
CUBLAS is basically just a "like-for-like" implementation of BLAS for CUDA devices. It only provides the standard Level 1, 2 and 3 BLAS functions, plus exactly three extensions - geam (scaled matrix addition/transposition), dgmm (diagonalised matrix-matrix dot product) and getrfBatched (batched LU factorisation for many small matrices). None of those functions will find the signed maximum value of a supplied vector or matrix.
NVIDIA ship cudpp and thrust, either of which are probably better for this sort operation. Also, CUBLAS 3.2 is two and a half years old.
As a final comment, I would strong recommend using either the CUBLAS 4.x or CUBLAS 5.x releases. The API and performance of the code has improved considerably, especially for newer hardware.