Question

Suppose you want to read the data from large text file (~300mb) to array of vectors: vector<string> *Data (assume that the number of columns is known).

//file is opened with ifstream; initial value of s is set up, etc...


Data = new vector<string>[col];
string u;
int i = 0;

do
{       
    istringstream iLine = istringstream(s);

    i=0;
    while(iLine >> u)
    {
        Data[i].push_back(u);
        i++;
    }
}
while(getline(file, s));

This code works fine for small files (<50mb) but memory usage is increasing exponentially when reading large file. I'm pretty sure that the problem is in creating istringstream objects each time in a loop. However, defining istringstream iLine; outside of both loops and putting each string into stream by iLine.str(s); and clearing the stream after inner while-loop (iLine.str(""); iLine.clear();) causes the same order of memory explosion as well. The questions that arise:

  1. why istringstream behaves this way;
  2. if it is the intended behavior, how the above task can be accomplished?

Thank you

EDIT: In regards to the 1st answer I do clean the memory allocated by array later in the code:

for(long i=0;i<col;i++)
    Data[i].clear();
delete []Data;

FULL COMPILE-READY CODE (add headers):

int _tmain(int argc, _TCHAR* argv[])
{
ofstream testfile;
testfile.open("testdata.txt");

srand(time(NULL));

for(int i = 1; i<1000000; i++)
{
    for(int j=1; j<100; j++)
    {
        testfile << rand()%100 << " ";
    }

    testfile << endl;
}

testfile.close();

vector<string> *Data;

clock_t begin = clock();

ifstream file("testdata.txt"); 

string s;

getline(file,s);

istringstream iss = istringstream(s);

string nums;

int col=0;

while(iss >> nums)
{
    col++;
}

cout << "Columns #: " << col << endl;

Data = new vector<string>[col];

string u;
int i = 0;

do
{

    istringstream iLine = istringstream(s);

    i=0;

    while(iLine >> u)
    {
        Data[i].push_back(u);
        i++;

    }

}
while(getline(file, s));

cout << "Rows #: " << Data[0].size() << endl;

for(long i=0;i<col;i++)
        Data[i].clear();
    delete []Data;

clock_t end = clock();

double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

cout << elapsed_secs << endl;

getchar();
return 0;
}
Était-ce utile?

La solution

vector<> grows memory geometrically. A typical pattern would be that it doubles the capacity whenever it needs to grow. That may leave a lot of extra space allocated but unused, if your loop ends right after such a threshold. You could try calling shrink_to_fit() on each vector when you are done.

Additionally, memory allocated by the C++ allocators (or even plain malloc()) is often not returned to the OS, but left in a process-internal free memory pool. this may lead to further apparent growth. And it may cause the results of shrink_to_fit() to be invisible from outside the process.

Finally if you have lots of small strings ("2-digit numbers"), the overhead of a stringobject may be considerable. Even if the implementation uses a small-string optimization, I'd assume that a typical string uses no less than 16 or 24 bytes (size, capacity, data pointer or small string buffer) - probably more on a platform where size_type is 64 bits. That is a lot of memory for 3 bytes of payload.

So I assume you are seeing normal behaviour of vector<>

Autres conseils

I seriously suspect this is not istringstream problem (especially, given you have same result with iLine constructor outside the loop).

Possibly, this is a normal behavior of the std::vector. To test that, how about you run the exact same lines, but comment out: Data[i].push_back(u);. See if your memory grows this way. If it doesn't then you know where the problem is..

Depends on your library, vector::push_back will expand its capacity by a factor of 1.5 (Microsoft) or 2 (glib) every time it needs more room.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top