The solution I came up with was calling cblas_dcopy()
then cblas_dscal()
.
It is not the best of all worlds but it is still faster than the raw loop.
Question
I work in simulation software and one of the many operations done on arrays is scaling a vector by a number.
I have code like this:
//Just some initialization code, don't bother about this part
int n = 10000;
std::vector<double> input(n, 42.0);
std::vector<double> output(input.size());
double alpha = 69.0;
//the actual calculation:
for (size_t i = 0; i < n; ++i) {
output[i] = input[i] * alpha;
}
I have the MKL library available, so if my calculations are done "in-place" the following can be written:
cblas_dscal(n, alpha, &input[0], 1);
However, this will change the input
variable, which is not what I want.
I tried using the mkl_domatcopy()
but it is very slow for this operation.
Solution
The solution I came up with was calling cblas_dcopy()
then cblas_dscal()
.
It is not the best of all worlds but it is still faster than the raw loop.
OTHER TIPS
From MKL reference:
The cblas_?axpy routines perform a vector-vector operation defined as
y := a * x + y where: a is a scalar, x and y are vectors each with a number of elements that equals n.
There is a routine within the BLAS-like extensions called cblas_?axpby
Here is an excerpt from the documentation:
Scales two vectors, adds them to one another and stores result in the vector.
The difference to cblas_?axpy
is that you have second scaling parameter on the result vector y. In your case, you could just set b := 0.0
and thus have your out of place scaling in one call, instead of the two cblas_dcopy()
and cblas_dscal()
.
Your code could look like this:
//Just some initialization code, don't bother about this part
int n = 10000;
std::vector<double> input(n, 42.0);
std::vector<double> output(input.size());
double alpha = 69.0;
//the actual calculation:
cblas_daxpby (output.size(), alpha, input.data(), 1, 0.0, output.data(), 1);