Question

I have a binary file storing float32 objects (9748422*5 of them). From such a collection (190MB roughly in size), I'm creating a set of Eigen::VectorXd vectors (each with 5 components), thus 9748422 of them. The underlying type is double, hence roughly double the input size for storing them.

But, as luck has it, the process requires a total of 2.5GB. This is a log of the PROCESS_MEMORY_COUNTERS:

    PageFaultCount: 0x000A3C40
    PeakWorkingSetSize: 0xA3C42000
    WorkingSetSize: 0xA3C42000
    QuotaPeakPagedPoolUsage: 0x00004ED8
    QuotaPagedPoolUsage: 0x00004ED8
    QuotaPeakNonPagedPoolUsage: 0x000057A8
    QuotaNonPagedPoolUsage: 0x000057A8
    PagefileUsage: 0xA3A9B000
    PeakPagefileUsage: 0xA3A9B000

I've tracked Eigen's internal allocator, and it indeed seems to "allocate" exactly the size I compute on paper. However, Eigen uses aligned_alloc for most of its dynamic vectors. Could this be generating this amount of havoc? If nothing comes to mind, could you recommend another place to look for an issue of why this is happening?

I cannot provide a compilable (online) cpp example, but here's a skeleton of what I'm doing:

struct SSCCE_struct
{
    Eigen::VectorXd m_data;
};

typedef std::vector<SSCCE_struct*> TVector;

int main(int argc, char* argv[])
{
    TVector outputVertices;
    HANDLE bpcHandle;
    bpcHandle = CreateFileA("D:\\sample.bpc",              
        GENERIC_READ,          
        FILE_SHARE_READ,       
        NULL,                 
        OPEN_EXISTING,        
        FILE_ATTRIBUTE_NORMAL, 
        NULL);                 

    LARGE_INTEGER  len_li;
    GetFileSizeEx (bpcHandle, &len_li);
    INT64 len = len_li.QuadPart; //(len_li.u.HighPart << 32) | len_li.u.LowPart;

    unsigned long long noPoints = len / 20;
    unsigned long noPointsRead = 0;
    unsigned long long currPointIdx = 0;

    outputVertices.resize( noPoints );

    DebugTrace( "No points %lu \n", noPoints );

    float buffer[ 5 * 1024 ];
    DWORD noBytesRead = 0;
    do 
    {
        ReadFile(bpcHandle, buffer, sizeof(buffer), &noBytesRead, NULL);
        noPointsRead = noBytesRead / 20;
        for (unsigned long idx = 0; idx < noPointsRead; ++idx )
        {
            outputVertices[ currPointIdx + idx ] = new SSCCE_struct();

            outputVertices[ currPointIdx + idx ]->m_data.resize(5);

            for (unsigned kdx = 0; kdx < 5; ++kdx)
            {
                outputVertices[ currPointIdx + idx ]->m_data[ kdx ] = buffer[ 5 * idx + kdx ];
            }
        }

        currPointIdx += noPointsRead;

    } while (noBytesRead);


    CloseHandle(bpcHandle);
}
}

Later edit:

I performed the test indicated in David's answer and the solution is to avoid dynamic allocations altogether. There are several combinations one can try out and here's the results for all of these:

1.

struct SSCCE_struct
{
    Eigen::Matrix<double,1,5> m_data;
};

typedef std::vector<SSCCE_struct*> TVector;

Yielding 1.4 GB (1.1 GB waste)

2.

 struct SSCCE_struct
 {
    Eigen::VectorXd m_data;
 };

 typedef std::vector< SSCCE_struct* > TVector;

Yielding 2.5 GB (2.2 GB waste)

3.

struct SSCCE_struct
{
    Eigen::Matrix<double,1,5> m_data;
};

typedef std::vector<SSCCE_struct> TVector;

Yielding 381 GB (with 40 MB of waste - totally reasonable and, perhaps, predictable).

Was it helpful?

Solution

You've got a lot of pointers here, and each pointer has allocation overhead. The pointers refer to small objects, and so the overhead is significant.

On top of that, dynamically allocated objects necessarily have more overhead than fixed size objects. That's because fixed size objects do not need to store matrix dimensions.

Here are the sources of your pointer overhead:

  1. Eigen::VectorXd uses dynamically allocated storage. That means a pointer.
  2. You store the objects in std::vector<SSCCE_struct*>. And that's another pointer, with overhead.

The most efficient way to store these objects is to remove the indirection. You can do that by switching to:

  1. Matrix<double, 5, 1>. This is a fixed size object and so has no indirection. What's more, as explained above, it does not need to store the matrix dimensions at runtime because they are known at compile time. For such a small object that is significant.
  2. Store the objects in std::vector<SSCCE_struct>. Again, you lose one level of indirection.

With these changes, the memory usage of your program, when compiled with release settings, drops to 383MB on my machine. That's much more in line with your expectations.

The big difference seems to be between Eigen::VectorXd and the fixed size object. If I use Eigen::VectorXd and std::vector<SSCCE_struct> then the memory usage jumps to 918MB. When I then go to std::vector<SSCCE_struct*> it makes a further jump to 1185MB.

These measurements will be highly dependent on the compiler. I've used VS2013 compiling 32 bit code.

OTHER TIPS

I am not allowed to comment, so I will post one more answer, even though I think that the above answers actually explained the source of the wasted memory (huge number of allocations).

I understand that you want to work on a lot of penta so you are using a vector of struct SSCCE_struct, ie

std::vector<SSCCE_struct*> TVector;

Have you considered using

Eigen::Matrix< double, Dynamic, 5 > outputVertices;
outputVertices.resize( noPoints, 5 );

which would avoid the waste of memory. I would consider this also for vectorization (to help eigen/compiler to better vectorize whatever it is that you are doing with them) even though 5 is not a very convenient number for vectorization as 4 or 8.

edit: i realised I am 3 years late the moment I clicked post...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top