I have an application written that allocates a matrix and a vector on the device using cudaMalloc/cudaMemcpy. The matrix is defined column-major. I would like to use a function from the cublas library (cublasSgemv) to multiply these together now. It appears that I will have to allocate duplicates of the the matrix and vector using cudaMalloc and initialize them from the host with cublasSetMatrix/cublasSetVector in order to use the cublas API function. Obviously duplicating all of this memory is going to be costly.

To my understanding, the cublasSetMatrix/cublasSetVector functions are just light wrappers of cudaMemCpy. I was wondering if it is possible to pass the pointers to the arrays initialized with cudaMemCpy to the cublas API function? Or, is it otherwise possible to lightly wrap the arrays in a way that the API will recognize, so that I can avoid all of the memory duplication?

有帮助吗?

解决方案

Yes you can use cudaMemcpy instead of cublasGet/SetMatrix. CUBLAS will work with that as well.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top