Perhaps the biggest gain in both speed and development time would be to use an existing linear algebra library instead of re-inventing the wheel, see
Use BLAS and LAPACK that is tuned to your machine architecture. In particular, the vector instructions available on your machine and the cache sizes have a BIG impact on the performance.
If your vectors are small enough, you may get a significant performance gain by allocating them on the stack. Now, you are allocating them on the heap.
By using template metaprogramming techniques you can eliminate many of the temporaries and unnecessary loops at compile time. To repeat the Wikipedia example here:
Say you have
Vec x = alpha*(u - v);
wherealpha
is a scalar andu
andv
areVecs
.If you implement it in the fashion you are doing it, it will cost you at least 2 temporary vectors (one for
u-v
and one for the multiplication withalpha
) and 2 passing through the memory (2 or 3 loops: one foru-v
, one for the multiplication withalpha
and one more for the assignment if it is not optimized away).If you do template metaprogramming,
Vec x = alpha*(u - v);
will boil down to a single loop with no temporaries and that is the best you can get. The gain becomes even bigger with more complicated expressions.At the moment you don't have these operations but I guess it is only a matter of time that you will need them (
weightValueVector()
is an indication).
Of course, if you use a linear algebra library, you don't have to know / worry about any of these but you can concentrate on your application instead and get blazing fast code.