Question

there are some problem when I'm computing `A.transpose*A' in cuda.

Suppose A is M*N matrix and stored in column-major, and I try to use this function cublasSgemm_v2 which is the Matrix-Matrix Multiplication API in cublas

like this :

cublasSgemm_v2(handle,CUBLAS_OP_T,CUBLAS_OP_N,N,N,M,&al,A,N,A,M,&beta,A_result,N)

Before call this function I test matrix A and it looks good , but it shows that parameter number 8 is illegal, I don't know why.

So I decide to use another API to compute A.tanspose*A cublas<t>syrk(). And the result returned stored in lower or upper of the matrix that means the rest of matrix is not referenced, and how to write a kernel to copy the elements to the symmetry part?

The other problem is my program sometimes crashed (may be one third possibility ) in beginning of the code like cudaMalloc or cbulascreate or somewhere else, I just modify some code in the middle of code , and it run many times before, what may be cause this?

Thank you

Was it helpful?

Solution

You have to read the cublas gemm documentation carefully.

There is a way to compute A' * A direct with the cublas<T>gemm, but it's tricky.

  cublasSgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, N, K, M, &alpha,
    A, M, A, M, &beta, B, N);

This is a kind of a little hack - A is your matrix of dimension A(MxN) stored in column major order and K = N. Therefor as response you wil get B = A' * A.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top