C++ reading from file puts three weird characters

https://stackoverflow.com/questions/10417613

05-06-2021
|

Question

When i read from a file string by string, >> operation gets first string but it starts with "ï»¿i" . Assume that first string is "street", than it gets as "ï»¿istreet".

Other strings are okay. I tried for different txt files. The result is same. First string starts with "ï»¿i". What is the problem?

Here is my code :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}

Solution

You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.

To detect and ignore the marker you could try this (untested) function:

bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}

OTHER TIPS

With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.

// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

To use:

ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}

Here is another two ideas.

if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
create your own prefix when saving the files, and when reading search for it and delete all what you found before.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow