Question

I'm writing templated matrix class, and I get stack overflows when returning by value from operators: +,-,* for larger matrices. I would prefer to somehow return by reference to relieve the stack and to avoid extra copying, but then, I would have to return an object constructed with new and break the general rule of "using delete for every new". I can't return by value because of copying overhead and stack limit problems, and I'm also unable to return by reference because of memory leaks, so what I should do then?

Here's my product function (Matrix contains 2D array elems):

    template<typename T, unsigned int n, unsigned int m> template<unsigned int m2>
Matrix<T,n,m2> Matrix<T,n,m>::operator*(Matrix<T,m,m2>& M) {
    T prod[n][m2];
    if(n*m < GPUAccelerationThreshold)
        for(int i = 0; i < n; i++)
            for(int j = 0; j < m2; j++) {
                prod[i][j] = elems[i][0] * M(0, j); 
                for(int p = 1; p < m; p++)
                    prod[i][j] += elems[i][p] * M(p, j); 
            }
    else {
        array_view<T, 2> product(n, m2, *prod);
        array_view<T, 2> a(n, m, *elems);
        array_view<T, 2> b(m, m2, M.elems[0]);

        parallel_for_each(
            product.extent, 
             [=](index<2> idx) restrict(amp) {
                int row = idx[0];
                int col = idx[1];
                for (int inner = 0; inner < m; inner++) {
                    product[idx] += a(row, inner) * b(inner, col);
                }
            }
        );
        product.synchronize();
    }


    return Matrix<T,n,m2>(prod);
}

I'm writing this class because I want to boost some matrix operations on GPU(with MS amp). I've searched for an existing solution, found GPU accelerated linear algebra libs, but what I couldn't find in them was a simple matrix class with +,-,* operators. Maybe someone could recommend me any?

Was it helpful?

Solution

Three quick comments:

  • Traditionally, Matrix classes have used dynamic allocation. You don't show your Matrix class, but if your data is:
    T myData[n][m];
    
    you might want to change it to:
        std::vector myData;
    
    , initializing it to the size n * m in the constructor, and calculating the single index in the operator[] (which should return a proxy if you want to do any bounds checking). Alternatively, you can use operator()( int i, int j ) for accessing an element: whether myMatrix( i, j ) or myMatrix[i][j] is prefered for access depends on who you ask. Although this solution increases total memory use slightly (but very slightly), it reduces the stack footprint to a couple of dozen bytes, regardless of the size of the matrix.
  • Also traditionally, matrix classes haven't had the dimensions as part of their template arguments. Whether this is a good thing or not is arguable. You get far better type checking (and errors at compile time, rather than runtime) with your solution, but if the dimensions are arguments to the constructor, rather than template arguments, you can read them from the command line or a configuration file or whatever. It's the classical safety vs. flexibility trade off. With regards to your problem, not having the dimensions as template parameters means that all matrices of type T have the same type. You can thus access the internals of the matrix you return from your member function, and you no longer need the intermediate T prod[n][m2]. Of course, you could make all instantiations of Matrix friend, or simply use the access functions to set the values. At any rate, you do not want an intermediate T prod[n][m2]; this not only requires a lot of on stack memory, it means that you'll have to copy the results.
  • Finally, and this is somewhat more advanced: in the best matrix classes, operator* does not return a matrix, but a helper class, along the lines of: template class MatrixMultiply { L const* myLhs; R const* myRhs; public: typedef T value_type; MatrixMultiply( L const& lhs, R const& rhs ) : myLhs( &lhs ) , myRhs( &rhs ) { } int getX() const { return myLhs->getX(); } int getY() const { return myRhs->getY(); } T get( int i, int j ) const { return calculateIJ( myLhs, myRhs ); } }; You then provide a templated constructor and assignment operator which uses getX(), getY() and get( i, j ). Your operator* is also a template, which returns a MatrixMultiply: template MatrixMultiply operator*( L const& lhs, R const& rhs ) { return MatrixMultiply( lhs, rhs ); } (Note that if L::value_type and R::value_type aren't identical, this won't compile. Which is what you want, except that the error messages will be far from clear.) The result is that you never actually build the intermediate, temporary matrices. As you can imagine, the above solution is greatly simplified. You'll need additional code for error handling, and I don't think that the parallelization is trivial. But it avoids the construction of all intermediate matrices, even in complicated expressions. (The same technique can be used using an abstract base class, say MatrixAccessor, with pure virtual getters, and deriving Matrix and all of the helpers like MatrixMultiply from it. IMHO, this is a lot more readable, and the error messages from the compiler will definitely be more understandable. The results will be the same as long as the compiler actually inlines all of the member functions. But that's a big if, since there can be significant function nesting.)

OTHER TIPS

There is no easy way to solve this. You can't return stack local variables as reference, because the memory "behind" the variable will go away when you return. So you have to have some dedicated storage somewhere. It doesn't HAVE to come from new/delete, but you do need to have some sort of storage when making copies of the data.

One solution would of course to have three-operand operation, so instead of:

a = b + c;

you use a function:

add(a, b, c);

where a, b and c are references.

It does make the code a lot messier, but I can't think of any more obvious way to solve the problem - if you don't want to write your own allocator/delete function (or a garbage collector).

Actually i can't fully understand your idea... Binary operators takes two arguments and create the result. In fact you can see that you are returning a new created object. So it's the only way to write programs: to allocate memory, to use it, and then delete it. In fact I even don't understand what your constructor does. If it simply copies the pointer to "prod" then the result matrix is broken when you return it from function because the "prod" memory will be deleted as soon as function returns(because it's created on a stack). So you can't return it by reference too.

The way i see the solution in to allocate memory in the matrix constructor. If you are making it as a template depending on matrix size, the size of the matrix is known from template parameters (I find it quite strange to make templates with matrix size as argument.. What is the point of that?). So you allocate memory in constructor with "new" and delete it in destructor with "delete". So this memory will be allocated according to RAII methodology that works pretty well in OOP. Then you implement methods such as setElement(i, j, value) and set the elements in new created matrix in your binary operator and return it.

There are some problems however that i want you to take care of. The copy constructor must really copy the matrix not just a pointer(or several destructors will try to destroy the same memory) or you can program the "lazy copy" model that actually copies the matrix on changing (see wiki). Or you can make the copy constructor private without realization (to prevent copying at all). If you are not allowed to create such methods as "setElement" because you don't want the user of your library to change matrix values, you can access to private data and methods (even in objects that we got as an argument or newly created) in such operators because you are inside the class method.

If you need to pass raw pointer to other calculation function as it's done in "else" part, you can make a constructor from a pointer that will copy just a pointer (but this is dangerous way. If you pass a pointer you must not access it nowhere because the matrix class is now the boss) or copy data totally(wich is slower but you can pass there pointers from stack or pointers you need to take control after) depending on your wish, and destructor will clean it when matrix destroys. Or you even can create private method such as "getRawMatrix()" which will return the raw pointer to data from matrix and pass this pointer to your calculation functions or even simply get the raw data pointer because you are inside a method of the matrix class.

Usually you allocate memory in a constructor and make "lazy copy" model because matrices can be huge. Inside the class is allowed to access the private data and members, that's the classes are made. I hope it will be usefull..

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top