Why is istream/ostream slow

Question 1

Actually, IOStreams don't have to be slow! It is a matter of implementing them in a reasonable way to make them fast, though. Most standard C++ library don't seem to pay too much attention to implement IOStreams. A long time ago when my CXXRT was still maintained it was about as fast as stdio - when used correctly!

Note that there are few performance traps for users laid out with IOStreams, however. The following guidelines apply to all IOStream implementations but especially to those which are tailored to be fast:

When using std::cin, std::cout, etc. you need to call std::sync_with_stdio(false)! Without this call, any use of the standard stream objects is required to synchronize with C's standard streams. Of course, when using std::sync_with_stdio(false) it is assumed that you don't mix std::cin with stdin, std::cout with stdout, etc.
Do not use std::endl as it mandates many unnecessary flushes of any buffer. Likewise, don't set std::ios_base::unitbuf or use std::flush unnecessarily.
When creating your own stream buffers (OK, few users do), make sure they do use an internal buffer! Processing individual characters jumps through multiple conditions and a virtual function which makes it hideously slow.

Question 2

There are several reasons why [i]ostreams are slow by design:

Shared formatting state: every formatted output operation has to check all formatting state that might have been previously mutated by I/O manipulators. For this reason iostreams are inherently slower than printf-like APIs (especially with format string compilation like in Rust or {fmt} that avoid parsing overhead) where all formatting information is local.
Uncontrolled use of locales: all formatting goes through an inefficient locale layer even if you don't want this, for example when writing a JSON file. See N4412: Shortcomings of iostreams.
Inefficient codegen: formatting a message with iostreams normally consists of multiple function calls because arguments and I/O manipulators are interleaved with parts of the message. For example, there are three function calls (godbolt) in
```
std::cout << "The answer is " << answer << ".\n";
```
compared to just one (godbolt) in the equivalent printf call:
```
printf("The answer is %d.\n", answer);
```
Extra buffering and synchronization. This can be disabled with sync_with_stdio(false) at the cost of poor interoperability with other I/O facilities.

Question 3

Perhaps this can give some idea of what you're dealing with:

#include <stdio.h>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <fstream>
#include <time.h>
#include <string>
#include <algorithm>

unsigned count1(FILE *infile, char c) { 
    int ch;
    unsigned count = 0;

    while (EOF != (ch=getc(infile)))
        if (ch == c)
            ++count;
    return count;
}

unsigned int count2(FILE *infile, char c) { 
    static char buffer[8192];
    int size;
    unsigned int count = 0;

    while (0 < (size = fread(buffer, 1, sizeof(buffer), infile)))
        for (int i=0; i<size; i++)
            if (buffer[i] == c)
                ++count;
    return count;
}

unsigned count3(std::istream &infile, char c) {    
    return std::count(std::istreambuf_iterator<char>(infile), 
                    std::istreambuf_iterator<char>(), c);
}

unsigned count4(std::istream &infile, char c) {    
    return std::count(std::istream_iterator<char>(infile), 
                    std::istream_iterator<char>(), c);
}

unsigned int count5(std::istream &infile, char c) {
    static char buffer[8192];
    unsigned int count = 0;

    while (infile.read(buffer, sizeof(buffer)))
        count += std::count(buffer, buffer+infile.gcount(), c);
    count += std::count(buffer, buffer+infile.gcount(), c);
    return count;
}

unsigned count6(std::istream &infile, char c) {
    unsigned int count = 0;
    char ch;

    while (infile >> ch)
        if (ch == c)
            ++count;
    return count;
}

template <class F, class T>
void timer(F f, T &t, std::string const &title) { 
    unsigned count;
    clock_t start = clock();
    count = f(t, 'N');
    clock_t stop = clock();
    std::cout << std::left << std::setw(30) << title << "\tCount: " << count;
    std::cout << "\tTime: " << double(stop-start)/CLOCKS_PER_SEC << "\n";
}

int main() {
    char const *name = "equivs2.txt";

    FILE *infile=fopen(name, "r");

    timer(count1, infile, "ignore");

    rewind(infile);
    timer(count1, infile, "using getc");

    rewind(infile);
    timer(count2, infile, "using fread");

    fclose(infile);

    std::ifstream in2(name);
    timer(count3, in2, "ignore");

    in2.clear();
    in2.seekg(0);
    timer(count3, in2, "using streambuf iterators");

    in2.clear();
    in2.seekg(0);
    timer(count4, in2, "using stream iterators");

    in2.clear();
    in2.seekg(0);
    timer(count5, in2, "using istream::read");

    in2.clear();
    in2.seekg(0);
    timer(count6, in2, "using operator>>");

    return 0;
}

Running this, I get results like this (with MS VC++):

ignore                          Count: 1300     Time: 0.309
using getc                      Count: 1300     Time: 0.308
using fread                     Count: 1300     Time: 0.028
ignore                          Count: 1300     Time: 0.091
using streambuf iterators       Count: 1300     Time: 0.091
using stream iterators          Count: 1300     Time: 0.613
using istream::read             Count: 1300     Time: 0.028
using operator>>                Count: 1300     Time: 0.619

and this (with MinGW):

ignore                          Count: 1300     Time: 0.052
using getc                      Count: 1300     Time: 0.044
using fread                     Count: 1300     Time: 0.036
ignore                          Count: 1300     Time: 0.068
using streambuf iterators       Count: 1300     Time: 0.068
using stream iterators          Count: 1300     Time: 0.131
using istream::read             Count: 1300     Time: 0.037
using operator>>                Count: 1300     Time: 0.121

As we can see in the results, it's not really a matter of iostreams being categorically slow. Rather, a great deal depends on exactly how you use iostreams (and to a lesser extent FILE * as well). There's also a pretty substantial variation just between these to implementations.

Nonetheless, the fastest versions with each (fread and istream::read) are essentially tied. With VC++ getc is quite a bit slower than either istream::read or and istreambuf_iterator.

Bottom line: getting good performance from iostreams requires a little more care than with FILE * -- but it's certainly possible. They also give you more options: convenience when you don't care all that much about speed, and performance directly competitive with the best you can get from C-style I/O, with a little extra work.

Question 4

While this question is quite old, I'm amazed nobody has mentioned iostream object construction.

That is, whenever you create an STL iostream (and other stream variants), if you step into the code, the constructor calls an internal Init function. In there, operator new is called to create a new locale object. And likewise, is destroyed upon destruction.

This is hideous, IMHO. And certainly contributes to slow object construction/destruction, because memory is being allocated/deallocated using a system lock, at some point.

Further, some of the STL streams allow you to specify an allocator, so why is the locale created NOT using the specified allocator?

Using streams in a multithreaded environment, you could also imagine the bottleneck imposed by calling operator new every time a new stream object is constructed.

Hideous mess if you ask me, as I am finding out myself right now!

Question 5

On a similar topic, STL says: "You can call setvbuf() to enable buffering on stdout."

https://web.archive.org/web/20170329163751/https://connect.microsoft.com/VisualStudio/feedback/details/642876/std-wcout-is-ten-times-slower-than-wprintf-performance-bug-in-c-library