CUDA cublas<t>gbmv understanding

https://stackoverflow.com/questions/10724767

cuda
cublas

10-06-2021
|

Question

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation

But it is actually very poor, so I didn't manage to understand what the kl and ku parameters mean. Moreover, I have no idea what stride is (it must also be provided). There is a brief explanation of these parameters (Page 37), but it looks like I need to know something else.

A search on the internet doesn't provide tons of useful information on this question, mostly references to different version of documentation.

So I have several questions to GPU/CUDA/cublas gurus:

How do I find more understandable docs or guides about using cublas?
If you know how to use this very function, couldn't you explain me how do I use it?
Maybe cublas library is somewhat extraordinary and everyone uses something more popular, better documented and so on?

Thanks a lot.

Solution

So BLAS (Basic Linear Algebra Subprograms) generally is an API to, as the name says, basic linear algebra routines. It includes vector-vector operations (level 1 blas routines), matrix-vector operations (level 2) and matrix-matrix operations (level 3). There is a "reference" BLAS available that implements everything correctly, but most of the time you'd use an optimized implementation for your architecture. cuBLAS is an implementation for CUDA.

The BLAS API was so successful as an API that describes the basic operations that it's become very widely adopted. However, (a) the names are incredibly cryptic because of architectural limitations of the day (this was 1979, and the API was defined using names of 8 characters or less to ensure it could widely compile), and (b) it is successful because it's quite general, and so even the simplest function calls require a lot of extraneous arguments.

Because it's so widespread, it's often assumed that if you're doing numerical linear algebra, you already know the general gist of the API, so implementation manuals often leave out important details, and I think that's what you're running into.

The Level 2 and 3 routines generally have function names of the form TMMOO.. where T is the numerical type of the matrix/vector (S/D for single/double precision real, C/Z for single/double precision complex), MM is the matrix type (GE for general - eg, just a dense matrix you can't say anything else about; GB for a general banded matrix, SY for symmetric matrices, etc), and OO is the operation.

This all seems slightly ridiculous now, but it worked and works relatively well -- you quickly learn to scan these for familiar operations so that SGEMV is a single-precision general-matrix times vector multiplication (which is probably what you want, not SGBMV), DGEMM is double-precision matrix-matrix multiply, etc. But it does take some practice.

So if you look at the cublas sgemv instructions, or in the documentation of the original, you can step through the argument list. First, the basic operation is

This function performs the matrix-vector multiplication y = a op(A)x + b y where A is a m x n matrix stored in column-major format, x and y are vectors, and and are scalars.

where op(A) can be A, A^T, or A^H. So if you just want y = Ax, as is the common case, then a = 1, b = 0. and transa == CUBLAS_OP_N.

incx is the stride between different elements in x; there's lots of situations where this would come in handy, but if x is just a simple 1d array containing the vector, then the stride would be 1.

And that's about all you need for SGEMV.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow