Question

I want to write programs that behave like unix utlities. In particular, I want to use them with pipes, e.g.:

grep foo myfile | ./MyTransformation [--args] | cut -f2 | ...

Three aspects make me wonder how to handle I/O:

  1. According to scources like Useless Use of Cat Award, it would be good to support both, reading from stdin and reading from a file (in the beginning of a pipeline). How is this best accomplished? I'm used to using the <getopt.h> / <cgetopt> stuff for parsing arguments. I could see if there is another file argument besides my options and read from it. If not, read from stdin. That would mean that stdin is ignore if an inut file is supplied. Is this desireable?

  2. According to this question, C++ synchronizes cout and cin with stdio and hence does not buffer well. This leads to a huge decrease in performance. A solution is to disable synchronization: cin.sync_with_stdio(false);. Should a program for use in pipes always disable synchronization with stdio for cin and cout? Or should it avoid using cin and cout and instead use their own form of buffered io?

  3. Since cout will be used for program output (unless an output file is specified), status messages (verbosity like % done) have to go somewhere else. cerr/stderr seems like an obvious choince. However, status are no errors.

In summary, I wonder about the io ahndling of such programs in c++. Can cin and cout be used despite the problems addressed above? Should I/O be handled differently? For example, reading and writing from/to buffered files wheres stdin and stdout are default files? What would be the recommended way to implement such a behavior?

Was it helpful?

Solution

The standard idiom if there are no options is:

int returnCode = 0;

void
processFile( std::string const& filename )
{
    if ( filename == "-" ) {
        process( std::cin );
    } else {
        std::ifstream in( filename.c_str() );
        if ( !in.is_open() ) {
            std::cerr << argv[0] << ": cannot open " << filename << std::endl;
            returnCode = 1;
        } else {
            process( in );
        }
    }
}

int
main( int argc, char** argv )
{
    if ( argc == 1 ) {
        processFile( "-" );
    } else {
        for ( int i = 1; i != argc; ++ i ) {
            processFile( argv[i] );
        }
    }
    std::cout.flush()
    return std::cout ? returnCode : 2;
}

There are many variants, however. I found myself doing this so often that I wrote a MultiFileInputStream class whose (template> constructor takes a pair of iterators; it then executes more or less the same code as the above. (All of the significant code is, as usual, in the corresponding streambuf.) Similarly, I have a class to parse out the options (which looks like an immutable std::vector<std::string> once the options have been parsed. So the above would become:

int
main( int argc, char** argv )
{
    CommandLine& args = CommandLine::instance();
    args.parse( argc, argv );
    MultiFileInputStream src( args.begin(), args.end() );
    process( src );
    return ProgramStatus::instance().returnCode();
}

(ProgramStatus is another useful class, which handles error output, and the return code. And flushes std::cout and adjusts the error code when you call returnCode() on it.)

I'm sure that anyone writing Unix filter programs has developed similar classes.

With regards to question 2: sync_with_stdio is a static member of std::ios_base, so you can call it without an object: std::ios_base::sync_with_stdio( false );. I find this less misleading, since the call will affect all iostream objects. If the IO handling is a blocking point, by all means do it, but most of the time, I don't bother. It's rare for such programs to need any sort of optimization. (Note that if you do call sync_with_stdio, then you should not use any C style IO. But I can't see any reason to use it anyway.)

With regards to question 3: error messages go to std::cerr, always. You also want to be sure to return a non-zero return code, even if the error wasn't fatal. Something like:

myprog file1 > tmp && mv tmp file1

is all to common, and if you have some problem, and don't generate the output, it's a disaster if you don't return a non-zero error code. (That's why I always flush and then check the status of std::cout. A long, long time ago, a user of my program did the above, with a very large file, and the disk was full. It wasn't quite as full afterwards. Since then: always flush std::cout, and check that it worked, before returning OK.)

OTHER TIPS

Are you sure you want to use C++? Most operating systems rely more on C and assembly than C++. If you're going to write apps then C++ could be a good choice, but for operating system and its utilities, shell and helper programs, they're usually coded in C. You can look through your Linux or BSD implementation to see how it is done with pipes, standard input and standard output. If you think that C is something for you, you could read the C book "THe C programming language" by Kernighan and Richie, there you have many examples how to write a good C program that uses pipes, std i/o and arguments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top