Question

I have a huge file to parse. Previously, it was separated by either space or comma and I used sscanf(string, "%lf %lf ", &aa, &bb); to get the data into my program.

But now the data format is changed to "122635.670399999","209705.752799999", with both comma and quotation marks. And I have no idea how to deal with it.

Actually, my previous code was found online and I had a really hard time finding proper documentation for this kind of problem. It will be great if you can recommend some to me.

Was it helpful?

Solution

Rather than read a string, then remove the commas and quotes from the strings, and finally convert the data to numbers, I'd probably create a locale object that classifies commas and quotes as white space, imbue the stream with that locale, and read the numbers without further adieu.

// here's our ctype facet:
class my_ctype : public std::ctype<char> {
public:
    mask const *get_table() { 
        static std::vector<std::ctype<char>::mask> 
            table(classic_table(), classic_table()+table_size);

        // tell it to classify quotes and commas as "space":
        table['"'] = (mask)space;
        table[','] = (mask)space;
        return &table[0];
    }
    my_ctype(size_t refs=0) : std::ctype<char>(get_table(), false, refs) { }
};

Using that, we can read the data something like this:

int main() { 
    // Test input from question:
    std::string input("\"122635.670399999\",\"209705.752799999\"");

    // Open the "file" of the input (from the string, for test purposes).
    std::istringstream infile(input);

    // Tell the stream to use the locale we defined above:
    infile.imbue(std::locale(std::locale(), new my_ctype));

    // Read the numbers into a vector of doubles:
    std:vector<double> numbers{std::istream_iterator<double>(infile),
                               std::istream_iterator<double>()};

    // Print out the sum of the numbers to show we read them:
    std::cout << std::accumulate(numbers.begin(), numbers.end(), 0.0);
}

Note that once we've imbued the stream with a locale using our ctype facet, we can just read numbers as if the commas and quotes didn't exist at all. Since the ctype facet classifies them as white-space, they're completely ignored beyond acting as separators between other stuff.

I'm pointing this out primarily to make clear that there's no magic in any of the processing after that. There's nothing special about using istream_iterator instead of (for example) double value; infile >> value; if you prefer to do that. You can read the numbers any of the ways you'd normally read numbers that were separated by white space -- because as far as the stream cares, that's exactly what you have.

OTHER TIPS

if you have got comma separated data in strings then just remove " from string like : let say string is str1

str1.erase(std::remove(str1.begin(), str1.end(), '"'), str1.end());

this will erase all occurrences of "

    //Use below code to convert string into float
    float f1;    
    std::stringstream ss;
    ss<<str1;
    ss>>f1;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top