Question

I'm doing some calculations, and doing some analysis on the forces and weakness of different BLAS implementations. however I have come across a problem.

I'm testing cuBlas, doing linAlg on the GPU would seem like a good idea, but there is one problem.

The cuBlas implementation using column-major format, and since this is not what I need in the end, I'm curious if there is a way in with one can make BLAS do matrix-transpose?

Was it helpful?

Solution

BLAS doesn't have a matrix transpose routine built in. The CUDA SDK includes a matrix transpose example with a paper which discusses optimal strategy for performing a transpose. Your best strategy is probably to use row major inputs to CUBLAS with the transpose input version of the calls, then perform the intermediate calculations in column major, and lastly perform a transpose operation afterwards using the SDK transpose kernel.


Edited to add that CUBLAS added a transpose routine in CUBLAS version 5, geam, which can performed matrix transposition in GPU memory and should be regarded as optimal for whatever architecture you are using.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top