Why does CUBLAS use const pointers for parameters?

Question 1

const denotes input arguments as read only both to the caller, and to the compiler (which can have an effect on optimisation)
Because using a pointer, as opposed to a value, allows CUBLAS v2 routines to read from either host or device memory (this is different from the CUBLAS v1 API)
See above. It is now possible for CUBLAS v2 calls to read scalar parameters from GPU memory, meaning that intermediate memory transfers from host to device can be eliminated and the performance of some types of operations improved. CUBLAS_POINTER_MODE_HOST one of the two possible pointer modes which the CUBLAS v2 API can use, the other being CUBLAS_POINTER_MODE_DEVICE. cublasSetPointerMode can be used to control the pointer behaviour of the v2 API, defining where numerical input and return values are in written to host or device memory.
No. It is legal to implicitly cast to const in C, but not legal to cast constness away. C++ provides the const_cast casting mechanism for this.

Question 2

In the example above the const pointers are all input parameters that will not be modified by the function. You don't need to pass actual pointers to const here - the const qualifier just guarantees that the data you provide for input will not be written to.

The non-const C parameter is an output parameter which points to data which will be modified by the function.

I don't know why alpha and beta are passed as pointers - this may just be a legacy of BLAS's FORTRAN origins.

Question 3

As to your 4th question, talonmies is correct, you need to cast to a const. A good example of how to cast to const for cublas< t>gemmBatched is given in the CUDA samples: batchCUBLAS.

They give this line for example:

 status1 = cublasXgemmBatched(handle, params.transa, params.transb, params.m, params.n,
                                     params.k, &params.alpha, (const T_ELEM **) devPtrA_dev, rowsA,
                                     (const T_ELEM **) devPtrB_dev, rowsB, &params.beta, devPtrC_dev, rowsC, opts.N);

where in the CUDA example, T_ELEM is a float. Notice the 8th input:

(const T_ELEM **) devPtrA_dev

which casts to a const. devPtrA was set in CUDA memory in the usual way, which also can be found in this CUDA sample.

You can find more information about CUDA samples here: https://developer.nvidia.com/cuda-code-samples