Question

When e.g. accumulating the results of different matrix-vector-multiplications (which is what you do in BLAS even when there's only one accumulating element), one formally starts with a zero vector. But there is no overhead-free way to directly allocate an array of zeroes on CUDA device memory (or is there?), so the solution that jumps to mind is to simply take an array with arbitrary numbers and, instead of initializing it to 0, pass a value beta = 0. to the first call of cublas<t>gemv or cusparse<t>csrmv. After all, if it has that parameter "why not use it".

Is this

  • a good idea? Or is the case β = 1 optimised in such a way that gives overall better performance to initialize an array to 0 and then use a call cusparseDcsrmv(..., 1., zeroes_array)?
  • safe? Naïvely, floating-points as representations of ℝ elements should fulfill x ⋅ 0 = 0 ∀ x, but this naïve treatment is of course often rather deadly when dealing with floating points. I'm quite sure it is safe when the array was previously used for some other operations with the same data type where the result was well-behaved, but is it also safe for an unititialised block of freshly allocated device memory?

I'm mainly interested in the sparse case, since for dense matrices the O( n² ) complexity of the multiplication makes it unnecessary to reason too much about performance of the O( n ) allocation of the vector.

Was it helpful?

Solution

This should be perfectly safe, and is done all the time; any package that didn't behave correctly when β=0 would have a serious bug.

The cusparse documentation for csrmv in particular says that "if beta is zero, y does not have to be a valid input", and LAPACK says something similar for *gemv ("When BETA is supplied as zero then Y need not be set on input")

As for optimizations, it might be the case that beta=1 and beta=0 have particular optimized code paths for any implementation, but might not; probably if one doesn't, neither does.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top