Question

In the following C++ code, I realised that gcount() was returning a larger number than I wanted, because getline() consumes the final newline character but doesn't send it to the input stream.

What I still don't understand is the program's output, though. For input "Test\n", why do I get " est\n"? How come my mistake affects the first character of the string rather than adding unwanted rubbish onto the end? And how come the program's output is at odds with the way the string looks in the debugger ("Test\n", as I'd expect)?

#include <fstream>
#include <vector>
#include <string>
#include <iostream>

using namespace std;

int main()
{
    const int bufferSize = 1024;
    ifstream input( "test.txt", ios::in | ios::binary );

    vector<char> vecBuffer( bufferSize );
    input.getline( &vecBuffer[0], bufferSize );
    string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() );
    cout << strResult << "\n";

    return 0;
}
Was it helpful?

Solution

I've also duplicated this result, Windows Vista, Visual Studio 2005 SP2.

When I figure out what the heck is happening, I'll update this post.

edit: Okay, there we go. The problem (and the different results people are getting) are from the \r. What happens is you call input.getline and put the result in vecBuffer. The getline function strips off the \n, but leaves the \r in place.

You then transfer the vecBuffer to a string variable, but use the gcount function from input, meaning you will get one char too much, because the input variable still contains the \n, and the vecBuffer does not.

The resulting strResult is:

-       strResult   "Test"
        [0] 84 'T'  char
        [1] 101 'e' char
        [2] 115 's' char
        [3] 116 't' char
        [4] 13 '␍'  char
        [5] 0   char

So then "Test" is printed, followed by a carriage return (puts the cursor back at the start of the line), a null character (overwriting the T), and finally the \n, which correctly puts the cursor on the new line.

So you either have to strip out the \r, or write a function that gets the string length directly from vecBuffer, checking for null characters.

OTHER TIPS

I've duplicated Tommy's problem on a Windows XP Pro Service Pack 2 system with the code compiled using Visual Studio 2005 SP2 (actually, it says "Version 8.0.50727.879"), built as a console project.

If my test.txt file contains just "Test" and a CR, the program spits out " est" (note the leading space) when run.

If I had to take a wild guess, I'd say that this version of the implementation has a bug where it is treating the Windows newline character like it should be treated in Unix (as a "go to the front of the same line" character), and then it wipes out the first character to hold part of the next prompt or something.


Update: After playing with it a bit, I'm positive that is what is going on. If you look at strResult in the debugger, you will see that it copied over a decimal 13 value at the end. That's CR, which in Windows-land is '\n', and everywhere else is "return to the beginning of the line". If I instead change your constructor to read:

string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );

...(so that the CR isn't copied) then it prints out "Test" like you'd expect.

I am pretty sure that the T is actually getting written and then overwritten. Running the same program in an rxvt window (cygwin) produces the expected output. You can do a couple things. If you get rid of the ios::binary in your open it will autoconvert \r\n to \n and things will work like you expect.

You can also open up your text file in the binary editor by clicking on the little down arrow on the open file dialog's open button and selecting open with...->Binary Editor. This will let you look at your file and confirm that it does indeed have \r\n and not just \n.

Edit: I redirected the output to a file and it is writing out:

Test\r\0\r\n

The reason you are getting the \0 is that gcount returns 6 (6 characters were removed from the stream) but the final delimiter is not copied to the buffer, a '\0' is instead. when you are constructing the string, you are actually telling it to include the '\0'. std::string has no problem with the embedded 0 and outputs it as asked. Some shells are apparently outputting a blank character and overwriting the T, while others don't do anything and the output looks okay, but is still probably wrong because it has the embedded '\0'

cout << strResult.c_str() << "\n";

Changing the last line to this will stop on the \0 and also get the output expected.

I tested your code using Visual Studio 2005 SP2 on Windows XP Pro SP3 (32-bit), and everything works fine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top