Since the CUBLAS V2 API was introduced (with CUDA 4.0, IIRC), it is possible to have routines which return a scalar or index to store those directly into a variable in device memory, rather than into a host variable (which entails a device to host transfer and might leave the result in the wrong memory space).
To use this, you need to use the cublasSetPointerMode
call to tell the CUBLAS context to expect pointers for scalar arguments to be device pointers by using the CUBLAS_POINTER_MODE_DEVICE
mode. This then implies that in a call like
cublasStatus_t cublasIsamax(cublasHandle_t handle, int n,
const float *x, int incx, int *result)
that result
must be a device pointer.