It seems to me that the correct data structure to be using is an std::unordered_map<std::string,std::vector<std::string>>
, and not an unordered_map<std::string,int>
, as your current implementation is attempting. This, because the fields you want to store seem more like strings; some aren't ints at all.
The first step is to extract the field names so that they can later be used as unordered_map
keys. Then start extracting the data rows, tokenizing them into fields. Next, for each field name, push_back the field data for the given CSV row. Here's an example (uses some C++11 constructs):
#include <string>
#include <iostream>
#include <vector>
#include <unordered_map>
#include <sstream>
std::vector<std::string> split ( std::string );
int main () {
// Sample data for a self-contained example.
std::vector<std::string> raw_data {
"USN,Name,DOB,Sem,Percentage",
"111,abc,07/03,3,88",
"112,cde,18/07,4,77"
};
// Ordered container for field names, unordered for field vectors.
auto field_names = split( raw_data[0] );
std::unordered_map<std::string,std::vector<std::string>> parsed;
// Store fields as vector elements within our unordered map.
for( auto it = std::begin(raw_data) + 1; it != std::end(raw_data); ++it ) {
auto fields = split( *it );
auto field_it = std::begin(fields);
for( auto name_it = std::begin(field_names);
name_it != std::end(field_names);
++name_it,
++field_it
) {
parsed[*name_it].push_back(*field_it);
}
}
// Dump our data structure to verify it's correct;
for( auto fn : field_names ) {
std::cout << fn << "\t";
}
std::cout << "\n";
for ( size_t ix = 0; ix != parsed[field_names[0]].size(); ++ix ) {
for( auto fn : field_names ) {
std::cout << parsed[fn][ix] << "\t";
}
std::cout << "\n";
}
std::cout << std::endl;
return 0;
}
std::vector<std::string> split ( std::string instring ) {
std::vector<std::string> output;
std::istringstream iss(instring);
std::string token;
while( getline( iss, token, ',' ) ) {
output.push_back(token);
}
return output;
}
In my example I'm starting with the input data contained in a vector named raw_data
. In your case, you're pulling the data from a file. So I'm dealing with the datastructure build-up, as I'm making the assumption that file handling isn't a core part of your question. You should be able to adapt the tokenizing and build-up of the data structure from my example pretty easily.
Also, I understand you are using tr1::unordered_map, which probably means that you aren't using C++11. Nevertheless, my C++11-isms are really just leveraging syntactic sugar that you can downgrade to equivalent C++03 compatibility without too much work.
Note, this is a relatively naive approach to CSV parsing. It makes assumptions that may work for your CSV data, but might not work for all forms of CSV. For example, it doesn't deal with quoting of fields to allow for embedded commas within fields. Nor does it deal with backslash-escaped commas, nor a multitude of other CSV parsing challenges.
If your data-set is less well-behaved than this parser can deal with, it would behoove you to seek out a full fledged CSV parsing library rather than fiddling around with rolling your own parser. ...at least that's what I would do if I were tasked with parsing less trivial forms of CSV.