Three quick comments:
- Traditionally,
Matrix
classes have used dynamic allocation. You don't show yourMatrix
class, but if your data is:T myData[n][m];
you might want to change it to:std::vector myData;
, initializing it to the sizen * m
in the constructor, and calculating the single index in theoperator[]
(which should return a proxy if you want to do any bounds checking). Alternatively, you can useoperator()( int i, int j )
for accessing an element: whethermyMatrix( i, j )
ormyMatrix[i][j]
is prefered for access depends on who you ask. Although this solution increases total memory use slightly (but very slightly), it reduces the stack footprint to a couple of dozen bytes, regardless of the size of the matrix. -
Also traditionally, matrix classes haven't had the dimensions as
part of their template arguments. Whether this is a good thing
or not is arguable. You get far better type checking (and
errors at compile time, rather than runtime) with your solution,
but if the dimensions are arguments to the constructor, rather
than template arguments, you can read them from the command line
or a configuration file or whatever. It's the classical safety
vs. flexibility trade off.
With regards to your problem, not having the dimensions as
template parameters means that all matrices of type
T
have the same type. You can thus access the internals of the matrix you return from your member function, and you no longer need the intermediateT prod[n][m2]
. Of course, you could make all instantiations ofMatrix
friend, or simply use the access functions to set the values. At any rate, you do not want an intermediateT prod[n][m2]
; this not only requires a lot of on stack memory, it means that you'll have to copy the results. -
Finally, and this is somewhat more advanced: in the best matrix
classes,
operator*
does not return a matrix, but a helper class, along the lines of: template class MatrixMultiply { L const* myLhs; R const* myRhs; public: typedef T value_type; MatrixMultiply( L const& lhs, R const& rhs ) : myLhs( &lhs ) , myRhs( &rhs ) { } int getX() const { return myLhs->getX(); } int getY() const { return myRhs->getY(); } T get( int i, int j ) const { return calculateIJ( myLhs, myRhs ); } }; You then provide a templated constructor and assignment operator which usesgetX()
,getY()
andget( i, j )
. Youroperator*
is also a template, which returns aMatrixMultiply
: template MatrixMultiply operator*( L const& lhs, R const& rhs ) { return MatrixMultiply( lhs, rhs ); } (Note that ifL::value_type
andR::value_type
aren't identical, this won't compile. Which is what you want, except that the error messages will be far from clear.) The result is that you never actually build the intermediate, temporary matrices. As you can imagine, the above solution is greatly simplified. You'll need additional code for error handling, and I don't think that the parallelization is trivial. But it avoids the construction of all intermediate matrices, even in complicated expressions. (The same technique can be used using an abstract base class, sayMatrixAccessor
, with pure virtual getters, and derivingMatrix
and all of the helpers likeMatrixMultiply
from it. IMHO, this is a lot more readable, and the error messages from the compiler will definitely be more understandable. The results will be the same as long as the compiler actually inlines all of the member functions. But that's a big if, since there can be significant function nesting.)