CUDA/CUBLAS: Accessing elements in an array

https://stackoverflow.com/questions/18573099

27-06-2022
|

Frage

As a follow up to a previous question here, I am trying to implement the following loop, which is a matrix-vector multiplication where the vector is a column from the matrix Q, based on the loop iterator :

EDIT: Q cannot be populated before hand but is populated with the progression of iterator K.

for (unsigned K=0;K<N;K++){   // Number of iterations loop
    //... do some stuff
    for (unsigned i=0; i<N; i++){
        float sum = 0;
        for (unsigned j=0; j<N; j++){
            sum += A[j][i]*Q[j][K];
        }
        v[i] = sum;
    }
    //... do some stuff
    // populate next column of Q
}

Where the dimensions of the arrays are:

A [N x N]

Q [N x (0.5N + 1)]

This arrays have been flattened in order to use them with cublasSgemv(). My question is, is it possible to use cublasSgemv() by telling it where to start accessing d_Q, and what the increment of the elements are (since it is row-major C++):

EDIT: multiplied memoery access increment with sizeof(float). Still doesn't work as far as i can tell.

Niter = 0.5*N + 1;
for (unsigned K=0;K<N;K++){
    cublasSgemv(handle, CUBLAS_OP_T, N, N, &alpha, d_A, N, (d_Q + sizeof(float)*K*(Niter)), (Niter), &beta, d_v , 1);
}

I don't think Its possible to index d_Q like that as I am not getting any results

SOLVED: the solution by @RobertCrovella is what I was looking for. Thanks.

Lösung

It is possible to index through your flattened Q matrix the way you propose. Your call to Sgemv should be as follows:

cublasSgemv(handle, CUBLAS_OP_T, N, N, &alpha, d_A, N, (d_Q + K), (Niter), &beta, (d_v+(K*Niter)) , 1);

The pointer to Q should point to the first element of the column in question, and since your matrix is row-major, this is just d_Q + K (using pointer arithmetic, not byte arithmetic). Niter is the stride (in elements, not bytes) between successive elements of the column in question. Note that your code as written would overwrite the results of one matrix-vector multiply with the next, since you are not indexing through d_v the output vector. So I added some indexing on d_v.

As @JackOLantern points out, it should also be possible to do this in a single step without your loop, by calling Sgemm:

cublasSgemm(handle, CUBLAS_OP_T, CUBLAS_OP_T N, Niter,  N, &alpha, d_A, N, d_Q, (Niter), &beta, d_v, N);

If your code is not working the way you expect, please provide a complete, compilable example.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow