Why do you want to avoid flushing stdout?

https://softwareengineering.stackexchange.com/questions/386269

20-02-2021
|

Pergunta

I stumbled upon a question in Codereview, and in one answer the feedback was to avoid std::endl because it flushes the stream. The full quote is:

I'd advise avoiding std::endl in general. Along with writing a new-line to the stream, it flushes the stream. You want the new-line, but almost never want to flush the stream, so it's generally better to just write a \n. On the rare occasion that you actually want the flush, do it explicitly: std::cout << '\n' << std::flush;.

The poster did not explain this, neither in the post or comments. So my question is simply this:

Why do you want to avoid flushing?

What made me even more curious was that the poster says that it's very rare that you want to flush. I have no problem imagining situations where you want to avoid flushing, but I still thought that you in general would want to flush when you print a newline. After all, isn't that the reason why std::endl is flushing in the first place?

Just to comment the close votes in advance:

I do not consider this opinion based. Which you should prefer may be opinion based but there are objective reasons to take into account. The answers so far proves this. Flushing affects performance.

Solução

The short and simple answer is that using std::endl can and will slow output by a huge margin. In fact, I'm reasonably convinced that std::endl is responsible for most of the notion that C++ iostreams are substantially slower than C-style I/O.

For example, consider a program like this:

#include <iostream>
#include <string>
#include <sstream>
#include <time.h>
#include <iomanip>
#include <algorithm>
#include <iterator>
#include <stdio.h>

char fmt[] = "%s\n";
static const int count = 3000000;
static char const *const string = "This is a string.";
static std::string s = std::string(string) + "\n";

void show_time(void (*f)(), char const *caption) { 
    clock_t start = clock();
    f();
    clock_t ticks = clock()-start;
    std::cerr << std::setw(30) << caption 
        << ": " 
        << (double)ticks/CLOCKS_PER_SEC << "\n";
}

void use_printf() {
    for (int i=0; i<count; i++)
        printf(fmt, string);
}

void use_puts() {
    for (int i=0; i<count; i++) 
        puts(string);        
}

void use_cout() { 
    for (int i=0; i<count; i++)
        std::cout << string << "\n";
}

void use_cout_unsync() { 
    std::cout.sync_with_stdio(false);
    for (int i=0; i<count; i++)
        std::cout << string << "\n";
    std::cout.sync_with_stdio(true);
}

void use_stringstream() { 
    std::stringstream temp;
    for (int i=0; i<count; i++)
        temp << string << "\n";
    std::cout << temp.str();
}

void use_endl() { 
    for (int i=0; i<count; i++)
        std::cout << string << std::endl;
}

void use_fill_n() { 
    std::fill_n(std::ostream_iterator<char const *>(std::cout, "\n"), count, string);
}

void use_write() {
    for (int i = 0; i < count; i++)
        std::cout.write(s.data(), s.size());
}

int main() { 
    show_time(use_printf, "Time using printf");
    show_time(use_puts, "Time using puts");
    show_time(use_cout, "Time using cout (synced)");
    show_time(use_cout_unsync, "Time using cout (un-synced)");
    show_time(use_stringstream, "Time using stringstream");
    show_time(use_endl, "Time using endl");
    show_time(use_fill_n, "Time using fill_n");
    show_time(use_write, "Time using write");
    return 0;
}

With standard output redirected to a file, this produces the following results:

             Time using printf: 0.208539
               Time using puts: 0.103065
      Time using cout (synced): 0.241377
   Time using cout (un-synced): 0.181853
       Time using stringstream: 0.223617
               Time using endl: 4.32881
             Time using fill_n: 0.209951
              Time using write: 0.102781

Using std::endl slowed the program by a factor of about 20 in this case. If you wrote shorter strings, the slow-down could/would be even greater.

There are a few cases where you really and truly do want to flush a stream manually--but they honestly are pretty few and far between.

Most times a stream needs to be flushed (e.g., you print a prompt, then wait for some input) it'll happen automatically unless you've used things like std::tie and/or std::sync_with_stdio to prevent that.

That leaves only a tiny number of truly unusual situations where you have good reason to flush a stream manually. Such cases are rare enough that it's well worth using std::flush when they happen, to make it apparent to the anybody reading the code that you're flushing the stream intentionally (and more often than not, probably also merits a comment about why this is one of the rare cases when flushing the stream really makes sense).

Outras dicas

Each time a process produces output, it has to call a function that actually does the work. In most cases, that function is ultimately write(2). On a multitasking operating system, the call to write() will trap into the kernel, which has to stop the process, handle the I/O, do other things while any blockages are cleared, put it on the ready queue and get it running again when the time comes. Collectively, you can call all of that activity system call overhead. If that sounds like a lot, it is.

Flushing a buffered stream* after writing a small amount of data or having no buffer at all incurs that overhead each time you do it:

1\n  (System call that writes two bytes)
2\n  (System call that writes two bytes)
3\n  (System call that writes two bytes)
4\n  (System call that writes two bytes)
5\n  (System call that writes two bytes)

This is how it was done in the very early days until someone figured out that it was burning a lot of system time. The overhead could be kept down by by accumulating output in a buffer until it was full or the program decided that it must be sent immediately. (You might want to do the latter if you're producing output sporadically that needs to be seen or consumed.) Avoiding a flush at the end of each line cuts the number of system calls, and the overhead incurred:

1\n
2\n
3\n
4\n
5\n  (Flush) (System call that writes ten bytes)

*Note that the concept of standard output is a file descriptor associated with a process and given a well-known number. This differs from the stdout defined by C, C++ and others, which are identifiers for implementations of a buffered stream that live entirely in userland and write to the standard output. The write() system call is not buffered.

Why flush is to be avoided:

Because IO works best when the operating system can work with relatively large amounts of data. Regular flushes with small amounts of data cause slowdowns, sometimes very significantly.

Why you should almost never flush manually:

There are automatic flushes that cover most use cases. For example, if a program writes to the console, the system by default flushes after every newline. Or if you write to file, the data is written once there is enough data to be written at once, and also when the file is closed.

When you should flush manually:

If you explicitly need to update the output immediately. Example: if you create a spinner or progress bar that repeatedly overwrites the current line. Or if you output to file and really want the file to be updated at specific moments.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange