I suggest a better method is:
1. read the entire line with std::getline
.
2. Extract the fields using std::substr
and the field widths.
3. Trim the field strings as necessary.
4. Process the fields.
5. Repeat at step 1 until the read fails.
Reading space-separated data with spaces data
質問
I'm reading lines from a text file where the data is separated in columns by spaces similar to this:
UNITED STATES OF AMERICA WASHINGTON 9629047 291289535
CHINA PEKING 9596960 1273111290
I had previously handled similar data using the following code:
ifstream readThis("somefile.txt", ios::in);
while (readThis >> country >> capital >> area >> population) {
// some code...
}
This worked fine when the data didn't have spaces (like "UNITED STATES OF AMERICA"). What happens now is that as soon as a space is encountered the data is saved to the next variable (ie. "2UNITED" would go to country
, "STATES" would go to capital
and so on). What I'm about to do is what I feel is pretty hack-y so I was hoping they'd be a better way of handling the data. Here's what I think of doing now:
- Read the entire line with
std::getline
. - Go through the line character by character.
- Store the characters in the proper variable until we've read 2 spaces in a row.
- At this point ignore any whitespace and read until we reach a character.
This method looks more like an exercise from K&R and probably isn't a C++ way of doing this. I should mention that the data is all properly aligned (the "columns" are all the same width). I'm thinking there has to be a way to read "aligned" data properly (basically the opposite of cout << setw(20) << "Hello" << ...
Any ideas welcomed. Thanks!
解決
他のヒント
That is a clear case for regular expressions if I know one (live here):
#include <iostream>
#include <sstream>
#include <boost/regex.hpp>
int main() {
std::istringstream i { "UNITED STATES OF AMERICA WASHINGTON, DC 2233232 23232323\nPOPULAR REPUBLIC OF CHINA BEIJING 23232323 23232323\nBRAZIL BRASILIA 232323233 2323323\n" };
boost::regex r { R"(^(.*?)\s\s+(.*?)\s\s+(\d+)\s\s+(\d+))", boost::regex::perl };
std::string line;
while( std::getline(i, line) ) {
boost::smatch m;
if( !boost::regex_match(line, m, r) )
continue;
auto country = m[1].str();
auto capital = m[2].str();
auto area = m[3].str();
auto pop = m[3].str();
std::cout << capital << ", " << country << ";\n";
}
}
Notice that
#include <regex>
and the use of std::regex
, std::smatch
, std::regex_match
only work if you are using libc++
, the GNU libstdc++
(up to 4.8) isn't working.