How to create a std::vector-like class which can binary read/write huge chunks?

Question 1

If you don't need the interface of the vector:

auto p = unique_ptr<T[]>{ new T[big_n] };

It won't initialize the array if T is POD, otherwise it calls default constructors (default-initialization).

In C++1y, you'll be able to use std::make_unique.

Question 2

Using vector::reserve() and then writing into vector::data() is a dirty hack and undefined behavior. Please don't do that.

The way to solve this problem is to use a custom allocator, such as the one in this answer. I have just tested it, works fine with clang 3.5 trunk but doesn't compile with gcc 4.7.2.

Although, as others have already pointed out, unique_ptr<T[]> will serve your needs just fine.

Question 3

If using boost is an option for you, since version 1.55 boost::container::vector has had support for explicitly default-initializing elements when resizing using the syntax:

using namespace boost::container;
vector<T> vector(37283, default_init);

at creation or

using namespace boost::container;
vector.resize(37283, default_init);

after creation. This results in the nice syntax:

using T = unsigned;  // but can be any trivially copyable type
FILE* fp = fopen( outfile.c_str(), "r" );
boost::container::vector<T> x(big_n, boost::container::default_init);
fread( x.data(), sizeof(T), big_n, fp );
fclose( fp );

In my tests performance is identical to using std::vector with a default-initializing allocator.

EDIT: Unrelated aside, I'd use an RAII wrapper for FILE*:

struct FILE_deleter {
  void operator () (FILE* f) const {
    if (f) fclose(f);
  }
};
using FILE_ptr = std::unique_ptr<FILE, FILE_deleter>;

using T = unsigned;  // but can be any trivially copyable type
FILE_ptr fp{fopen( outfile.c_str(), "r" )};
boost::container::vector<T> x(big_n, boost::container::default_init);
fread( x.data(), sizeof(T), big_n, fp.get() );

I'm a bit OCD about RAII.

EDIT 2: Another option, if you absolutely MUST produce a std::vector<T>, and not a boost::container::vector<T> or std::vector<T, default_allocator<T>>, is to fill your std::vector<T> from a custom iterator pair. Here's one way to make an fread iterator:

template <typename T>
class fread_iterator :
  public boost::iterator_facade<fread_iterator<T>, T,
                                std::input_iterator_tag, T> {
  friend boost::iterator_core_access;

  bool equal(const fread_iterator& other) const {
    return (file_ && feof(file_)) || n_ <= other.n_;
  }

  T dereference() const {
    // is_trivially_copyable is sufficient, but libstdc++
    // (for whatever reason) doesn't have that trait.
    static_assert(std::is_pod<T>::value,
                 "Jabberwocky is killing user.");
    T result;
    fread(&result, sizeof(result), 1, file_);
    return result;
  }

  void increment() { --n_; }

  FILE* file_;
  std::size_t n_;

public:
  fread_iterator() : file_(nullptr), n_(0) {}
  fread_iterator(FILE* file, std::size_t n) : file_(file), n_(n) {}
};

(I've used boost::iterator_facade to reduce the iterator boilerplate.) The idea here is that the compiler can elide the move construction of dereference's return value so that fread will read directly into the vector's memory buffer. It will likely be less efficient due to calling fread once per item vs. just once for the allocator modification methods, but nothing too terrible since (a) the file data is still only copied once from the stdio buffer into the vector, and (b) the whole point of buffering IO is so that granularity has less impact. You would fill the vector using its assign(iterator, iterator) member:

using T = unsigned;  // but can be any trivially copyable type
FILE_ptr fp{fopen( outfile.c_str(), "r" )};
std::vector<T> x;
x.reserve(big_n);
x.assign(fread_iterator<T>{fp.get(), big_n}, fread_iterator<T>{});

Throwing it all together and testing side-by-side, this iterator method is about 10% slower than using the custom allocator method or boost::container::vector. The allocator and boost method have virtually identical performance.

Question 4

Since you are upgrading to c++11, why not use file streams as well ? I just tried to read a 17 MB to a char* using ifstream & then write the contents to a file using ofstream.

I ran the same application in a loop 15 times and the maximum time it took is 320 ms and minimum is 120 ms.

std::unique_ptr<char []> ReadToEnd(const char* filename)
{
    std::ifstream inpfile(filename, std::ios::in | std::ios::binary | std::ios::ate);
    std::unique_ptr<char[]> ret;
    if (inpfile.is_open())
    {
        auto sz = static_cast<size_t>(inpfile.tellg());
        inpfile.seekg(std::ios::beg);
        ret.reset(new char[sz + 1]);
        ret[sz] = '\0';
        inpfile.read(ret.get(), sz);
    }

    return ret;
}


int main(int argc, char* argv [])
{

    auto data = ReadToEnd(argv[1]);
    std::cout << "Num of characters in file:" << strlen(data.get()) << "\n";

    std::ofstream outfile("output.txt");
    outfile.write(data.get(), strlen(data.get()));

}

Output

D:\code\cpp\ConsoleApplication1\Release>ConsoleApplication1.exe d:\code\cpp\SampleApp\Release\output.txt
Num of characters in file:18805057
Time taken to read the file, d:\code\cpp\SampleApp\Release\output.txt:152.008 ms.