C++ Eigen Matrix Operations vs. Memory Allocation Performance

Question 1

You can manage your own memory in a way that fits your needs and use Eigen::Map instead of Eigen::Matrix to perform calculations with it. Just make sure the data is aligned properly or notify Eigen if it isn't.

See the reference Eigen::Map for details.

Here is short example:

#include <iostream>
#include <Eigen/Core>


int main() {
    int mydata[3 * 4]; // Manage your own memory as you see fit
    int* data_ptr = mydata;

    Eigen::Map<Eigen::MatrixXi, Eigen::Unaligned> mymatrix(data_ptr, 3, 4);

    // use mymatrix like you would any another matrix
    mymatrix = Eigen::MatrixXi::Zero(3, 4);
    std::cout << mymatrix << '\n';

    // This line will trigger a failed assertion in debug mode
    // To change it see
    // http://eigen.tuxfamily.org/dox-devel/TopicAssertions.html
    mymatrix = Eigen::MatrixXi::Ones(3, 6);


    std::cout << mymatrix << '\n';
}

Question 2

To gather my comments into a full idea. Here is how I would try to do it.

Because the memory allocation in eigen is a pretty advanced stuff IMO and they do not expose much places to tap into it. The best bet is to wrap eigen objects itself into some kind of resource manager, like OP did.

I would make it a simple bin, that hold Matrix< Scalar, Dynamic, Dynamic> objects. This way you template the Scalar type and have a manager for generalized size matrices.

Whenever you call for an object, you check if you have a free object of the desired size, you return reference to it. If not, you allocate a new one. Simple. when you want to release the object, then you mark it free in the resource manager. I don't think anything more complicated is needed, but of course it's possible to implement some more sophisticated logic.

To ensure thread safety I would put a lock in the manager. Initialize it in the constructor if needed. Of course locking on free and allocate would be needed.

However depending on the work schedule. If the threads work on their own arrays I would consider to make one resource manager instance for each thread, so they don't clock each other. The thing is, that a global lock or a global manager would possibly get exhausted if you have like 12 cores working heavy on allocations/deallocations, and effectively serialize your app thourgh this one lock.

Question 3

You can try replacing your default memory allocator with jemalloc or tcmalloc. It's pretty easy to try out thanks to the LD_PRELOAD mechanism.

I think it works for most C++ projects as well.

Question 4

You could allocate memory for some common matrix sizes before calling that function with operator new or operator new[], store the void* pointers somewhere and let the function itself retrieve an memory block with the right size. After that, you can use placement new for matrix construction. Details are given in More effective C++, item 8.