Question

In order to improve performance reading from a file, I'm trying to read the entire content of a big (several MB) file into memory and then use a istringstream to access the information.

My question is, which is the best way to read this information and "import it" into the string stream? A problem with this approach (see bellow) is that when creating the string stream the buffers gets copied, and memory usage doubles.

#include <fstream>
#include <sstream>

using namespace std;

int main() {
  ifstream is;
  is.open (sFilename.c_str(), ios::binary );

  // get length of file:
  is.seekg (0, std::ios::end);
  long length = is.tellg();
  is.seekg (0, std::ios::beg);

  // allocate memory:
  char *buffer = new char [length];

  // read data as a block:
  is.read (buffer,length);

  // create string stream of memory contents
  // NOTE: this ends up copying the buffer!!!
  istringstream iss( string( buffer ) );

  // delete temporary buffer
  delete [] buffer;

  // close filestream
  is.close();

  /* ==================================
   * Use iss to access data
   */

}
Was it helpful?

Solution

std::ifstream has a method rdbuf(), that returns a pointer to a filebuf. You can then "push" this filebuf into your stringstream:

#include <fstream>
#include <sstream>

int main()
{
    std::ifstream file( "myFile" );

    if ( file )
    {
        std::stringstream buffer;

        buffer << file.rdbuf();

        file.close();

        // operations on the buffer...
    }
}

EDIT: As Martin York remarks in the comments, this might not be the fastest solution since the stringstream's operator<< will read the filebuf character by character. You might want to check his answer, where he uses the ifstream's read method as you used to do, and then set the stringstream buffer to point to the previously allocated memory.

OTHER TIPS

OK. I am not saying this will be quicker than reading from the file

But this is a method where you create the buffer once and after the data is read into the buffer use it directly as the source for stringstream.

N.B.It is worth mentioning that the std::ifstream is buffered. It reads data from the file in (relatively large) chunks. Stream operations are performed against the buffer only returning to the file for another read when more data is needed. So before sucking all data into memory please verify that this is a bottle neck.

#include <fstream>
#include <sstream>
#include <vector>

int main()
{
    std::ifstream       file("Plop");
    if (file)
    {
        /*
         * Get the size of the file
         */
        file.seekg(0,std::ios::end);
        std::streampos          length = file.tellg();
        file.seekg(0,std::ios::beg);

        /*
         * Use a vector as the buffer.
         * It is exception safe and will be tidied up correctly.
         * This constructor creates a buffer of the correct length.
         *
         * Then read the whole file into the buffer.
         */
        std::vector<char>       buffer(length);
        file.read(&buffer[0],length);

        /*
         * Create your string stream.
         * Get the stringbuffer from the stream and set the vector as it source.
         */
        std::stringstream       localStream;
        localStream.rdbuf()->pubsetbuf(&buffer[0],length);

        /*
         * Note the buffer is NOT copied, if it goes out of scope
         * the stream will be reading from released memory.
         */
    }
}

This seems like premature optimization to me. How much work is being done in the processing. Assuming a modernish desktop/server, and not an embedded system, copying a few MB of data during intialization is fairly cheap, especially compared to reading the file off of disk in the first place. I would stick with what you have, measure the system when it is complete, and the decide if the potential performance gains would be worth it. Of course if memory is tight, this is in an inner loop, or a program that gets called often (like once a second), that changes the balance.

Another thing to keep in mind is that file I/O is always going to be the slowest operation. Luc Touraille's solution is correct, but there are other options. Reading the entire file into memory at once will be much faster than separate reads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top