Dumbfounded in search of a working c & c++ method for carriage return removal (reading .csv file)

StackOverflow https://stackoverflow.com/questions/21811502

  •  12-10-2022
  •  | 
  •  

Domanda

I am reading .csv file lines via :

vector <string> vec_str;
char line[40];
while (1) {
    memset (line, 0, 40 * sizeof (char));
    if (fgets (line, 0, f_str_csv) == NULL) {
        break;
    }
    //and then making strings out of NL trimmed line, and storing them in vec_str as
    vec_str.push_back (string (line, strlen (line)).substr (0, strlen (line) - 1));
}

I do not seem to be able to get rid of carriage return at the end of line read from the csv file. This becomes apparent when I parse the string thru strtok and sscanf via:

vector <string>::iterator vec_str_it = vec_str.begin ();
strncpy (line, (*vec_str_it).c_str (), (*vec_str_it).length ());

char *buffer = NULL;
int data[2], i = 0;
char str[10];
buffer = strtok (line, ",");
while (buffer != NULL) {
    cout << "[" << buffer << "]" << endl;
    if (i == 2)
        sscanf (buffer, "%s", str);
    else
        sscanf (buffer, "%d", &data[i]);
    buffer = NULL;
    buffer = strtok (NULL, ",");
    i++;
}

gives me an output:

[10]
[20]
[James K.
]

for the input 10,20,James K. which is the line I read from the csv file.

What is happening wrong here?

Edit: Also for later files, if a smaller name happens at the end of the line, like 11,31,Wu S. after the James K. line, I get remnants of James K. in the buffer after 2nd iteration as is obvious from result as :

[11]
[31]
[Wu S.
K.
]

Someone please tell me how to avoid this misbehaviour of carriage returns.

È stato utile?

Soluzione

Here's how to do it using std::getline and std::ifstream:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
#include <sstream>
#include <vector>

class line {
public:
  operator std::string() const {
    return data_;
  }

  friend std::ostream& operator<<(std::ostream& out, const line& self) {
    out << self.data_ << std::endl;
    return out;
  }

  friend std::istream& operator>>(std::istream& in, line& self) {
    std::getline(in, self.data_);
    return in;
  }

private:
  std::string data_;
};

class csv_part {
public:
  operator std::string() const {
    return data_;
  }

  friend std::ostream& operator<<(std::ostream& out, const csv_part& self) {
    out << self.data_;
    return out;
  }

  friend std::istream& operator>>(std::istream& in, csv_part& self) {
    std::getline(in, self.data_, ',');
    return in;
  }

private:
  std::string data_;
};

int main() {
  std::ifstream f_str_csv("myfile.csv");
  if(f_str_csv.is_open()) {
    std::vector<std::string> vec_str;
    // Read all lines from file
    std::copy(std::istream_iterator<line>(f_str_csv),
              std::istream_iterator<line>(),
              std::back_inserter(vec_str));
    // loop through all lines read
    for(std::vector<std::string>::const_iterator it = vec_str.begin();
        it != vec_str.end();
        ++it) {
      std::istringstream is(*it);
      // Print every part of the line (separated with a comma),
      // separated with a pipe symbol (|)
      std::copy(std::istream_iterator<csv_part>(is),
                std::istream_iterator<csv_part>(),
                std::ostream_iterator<csv_part>(std::cout, "|"));
      std::cout << std::endl;
    }
  } else {
    std::cerr << "Could not open input file" << std::endl;
  }
}

Notice how you can specify an argument to std::getline for it to use as an 'end-of-line' character. This is particularly useful to parse every line read as a comma-separated list.

Altri suggerimenti

Some line reading of a CSV which might contain \r and/or \n ad the end of a line using a C/FILE:

#include <cstdio>
#include <cstring>
#include <iostream>
#include <stdexcept>
#include <string>
#include <vector>

int main()
{
    FILE* f_str_csv = stdin;
    std::vector<std::string> vec_str;
    const unsigned Length = 4096;
    char line[Length];
    while (std::fgets(line, Length, f_str_csv) != NULL) {
        std::size_t n = std::strlen(line);
        if(n) {
            if(n + 1 == Length) {
                // Overflow
                throw std::overflow_error("Overflow");
            }
            switch(line[n-1]) {
                case '\n':
                case '\r':
                if(--n) {
                    switch(line[n-1]) {
                        case '\n':
                        case '\r':
                        --n;
                    }
                }
            }
            if(n) {
                char* ln = line;
                switch(ln[0]) {
                    case '\n':
                    case '\r':
                    ++ln;
                    --n;
                }
                if(n) {
                    vec_str.push_back (std::string (ln, ln + n));
                }
            }
        }
    }
    for(unsigned i = 0; i < vec_str.size(); ++i)
        std::cout << vec_str[i] << '\n';
}

From the man-page for fgets:

fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.

Hence you can skip the

memset (line, 0, 40 * sizeof (char));

and should pass the buffer-size not zero as second argument:

Here is how easy it can be when boost and std::getline are used:

#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <boost/algorithm/string.hpp>

int main(int argc, const char *argv[])
{
    std::fstream file;  
    file.open("input.txt"); 

    std::vector<std::vector<std::string>> lines;

    while (!file.eof()) { 

        std::string line; 

        // Read a line in the file to replace your C code. 
        std::getline(file, line); 

        std::vector<std::string> words; 

        // Split the words in the line - "\t ," defines the set of delimiters.
        boost::split(words, line, boost::is_any_of("\t ,"));

        lines.push_back(words); 
    }

    // Oputput each word, you can also replace the range based loops if you don't 
    // use C++11.  
    for (const auto& line : lines)
    {
        for (const auto& word : line)
        {
            std::cout << word << " "; 
        }
        std::cout << std::endl; 
    }

    file.close(); 

    return 0;
}

And for the data you want to use stored in a text file "input.txt":

11,31,James K.
11,31,Wu S.

Compiling the program with g++ -std=c++11 main.cpp -o main and executing it results in:

11 31 James K. 
11 31 Wu S. 
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top