I am timing some simple numerical operations to decide how (i.e., using what tools from what library) I am going to implement a computationally intensive simulation. The code below computes the sum of two inner products of vectors using (1) MTL4 version 4.0.9486 as is (i.e., without BLAS
), (2) std::vector
s and std::inner_product
, and (3) std::valarray
s. I picked a toy example of this specific form because it seemed like ideal ground for MTL4's expression templates.
To narrow everything down to a single question, is the following comparison fair or it puts (unintentionally) any one of the three approaches at disadvantage? I was a bit surprised that (2) is faster than (1). Whether the overall simulation will be faster or not is a different story, of course.
If anyone has any suggestions for more thorough tests that might reveal the strengths or weakness of each approach, I am eager to try them out.
Pardon the macros in the code; they are just std::cout<<
statements and calls to <chrono>
utilities.
Thanks in advance.
C++ code:
#include <iostream>
#include <valarray>
#include <vector>
#include <algorithm>
#include <boost/numeric/mtl/mtl.hpp>
int main(int argc, const char * argv[])
{
/* DOT PRODUCTS */
constexpr int trials{15};
std::vector<double> mtl_times(trials, 0.0), stl_times(trials, 0.0), valarray_times(trials, 0.0);
constexpr size_t sz{10000000};
double val = M_PI;
mtl::dense_vector<double> m(sz, val), n(sz, val), p(sz, val), q(sz, val);
std::vector<double> y(sz, val), z(sz, val), v(sz, val), w(sz, val);
std::valarray<double> b(val, sz), c(val, sz), d(val, sz), e(val, sz);
double x{0.0}, k{0.0}, aa{0.0};
auto t0 = NOW
auto t1 = t0;
for (int i = 0; i < trials; ++i) {
// MTL4 vectors
t0 = NOW // call now() from <chrono>
k = dot(m, n) + dot(p, q);
t1 = NOW
mtl_times[i] = DURATIONm // duration cast of (t1-t0).count()
// STL vectors
t0 = NOW
x = std::inner_product(y.begin(), y.end(), z.begin(), 0.0) + std::inner_product(v.begin(), v.end(), w.begin(), 0.0);
t1 = NOW
stl_times[i] = DURATIONm
// valarrays
t0 = NOW
aa = (b*c + d*e).sum();
t1 = NOW
valarray_times[i] = DURATIONm
}
std::cout << "MTL4: average time for dot product = " << std::accumulate(mtl_times.begin(), mtl_times.end(), 0.0)/mtl_times.size() << " msec\n";
PRINTV(mtl_times)
PRINTME(result, k)
std::cout << '\n';
std::cout << "STL vectors + std::inner_product: average time for dot product = " << std::accumulate(stl_times.begin(), stl_times.end(), 0.0)/stl_times.size() << " msec\n";
PRINTV(stl_times)
PRINTME(result, x)
std::cout << '\n';
std::cout << "valarrays: average time for dot product = " << std::accumulate(valarray_times.begin(), valarray_times.end(), 0.0)/valarray_times.size() << " msec\n";
PRINTV(valarray_times)
PRINTME(result, aa)
return 0;
}
C++ Output:
MTL4: average time for dot product = 180.333 msec
mtl_times =
177 175 174 174 175 178 176 185 184 174 175 179 175 216 188
result: 1.97392e+08
STL vectors + std::inner_product: average time for dot product = 58.6 msec
stl_times = 56 55 56 57 57 56 57 56 57 55 55 58 56 58 90
result: 1.97392e+08
valarrays: average time for dot product = 64.4 msec
valarray_times = 63 64 63 64 65 63 63 63 64 63 63 63 64 64 77
result: 1.97392e+08
For the record, MatLab performs well:
MatLab code:
trials = 15;
times_ms = zeros(1, trials);
sz = 1e7;
val = pi;
x(sz) = val;
x(1:end-1) = val;
y(sz) = val;
y(1:end-1) = val;
v(sz) = val;
v(1:end-1) = val;
w(sz) = val;
w(1:end-1) = val;
z = 0;
for i = 1:trials
tic
z = x*y' + v*w';
times_ms(i) = toc*1e3;
end
avg_time = sum(times_ms)/length(times_ms)
times_ms
z
MatLab output:
avg_time = 56.0687 msec
times_ms = 56.8919 57.2052 55.3179 55.5126 55.7660 55.3982 55.1044 55.4809 57.7229 56.1902 57.3888 56.5263 55.2830 55.4926 55.7501
z = 1.9739e+08
This is not surprising since built-in operations are optimised, however there are other obstacles associated with using MatLab in the simulation.