Question

My this question is based on the previous question i have asked.

Multi level hash/dictionary creation in C++

I have this structure which i need to store in unordered_map using indexof function. I am using a config file as specified in the previous question. I need to store the first line as index and the subsequent values as an array under that index.

So far i have been able to do this.

#include <cstdlib.h>

#include <stdafx.h>
#include <string.h>
#include <stdio.h>
#include <unordered_map>
#include <iostream>
#include <string>
using namespace std::tr1;

#define false 0
#define true 1

int main()
{
    unordered_map <std::string, int> m;
    FILE *f;

    char c[255];
    char * pnt;

    f=fopen("Config.csv","r");
    if(f==NULL)
    return 1;

    while(!feof(f))
    {    
        fgets(c, 255, f);  //go to the next line
        pnt = strtok( c, ",");
        while( pnt!= NULL ) 
        {

            //the code for storing the values goes here.
            pnt = strtok( NULL, "," );
        }
    }
    fclose(f);
    return 0;
}

My CSV file looks like this----

USN,Name,DOB,Sem,Percentage
111,abc,07/03,3,88
112,cde,18/07,4,77
Was it helpful?

Solution

It seems to me that the correct data structure to be using is an std::unordered_map<std::string,std::vector<std::string>>, and not an unordered_map<std::string,int>, as your current implementation is attempting. This, because the fields you want to store seem more like strings; some aren't ints at all.

The first step is to extract the field names so that they can later be used as unordered_map keys. Then start extracting the data rows, tokenizing them into fields. Next, for each field name, push_back the field data for the given CSV row. Here's an example (uses some C++11 constructs):

#include <string>
#include <iostream>
#include <vector>
#include <unordered_map>
#include <sstream>

std::vector<std::string> split ( std::string );

int main () {

  // Sample data for a self-contained example.
  std::vector<std::string> raw_data {
    "USN,Name,DOB,Sem,Percentage",
    "111,abc,07/03,3,88",
    "112,cde,18/07,4,77"
  };


  // Ordered container for field names, unordered for field vectors.
  auto field_names = split( raw_data[0] );
  std::unordered_map<std::string,std::vector<std::string>> parsed;


  // Store fields as vector elements within our unordered map.
  for( auto it = std::begin(raw_data) + 1; it != std::end(raw_data); ++it ) {
    auto fields = split( *it );
    auto field_it = std::begin(fields);
    for( auto name_it = std::begin(field_names);
         name_it != std::end(field_names);
         ++name_it,
         ++field_it
    ) {
      parsed[*name_it].push_back(*field_it);
    }
  }


  // Dump our data structure to verify it's correct;
  for( auto fn : field_names ) {
    std::cout << fn << "\t";
  }
  std::cout << "\n";
  for ( size_t ix = 0; ix != parsed[field_names[0]].size(); ++ix ) {
    for( auto fn : field_names ) {
      std::cout << parsed[fn][ix] << "\t";
    }
    std::cout << "\n";
  }
  std::cout << std::endl;


  return 0;
}


std::vector<std::string> split ( std::string instring ) {
  std::vector<std::string> output;
  std::istringstream iss(instring);
  std::string token;
  while( getline( iss, token, ',' ) ) {
    output.push_back(token);
  }
  return output;
}

In my example I'm starting with the input data contained in a vector named raw_data. In your case, you're pulling the data from a file. So I'm dealing with the datastructure build-up, as I'm making the assumption that file handling isn't a core part of your question. You should be able to adapt the tokenizing and build-up of the data structure from my example pretty easily.

Also, I understand you are using tr1::unordered_map, which probably means that you aren't using C++11. Nevertheless, my C++11-isms are really just leveraging syntactic sugar that you can downgrade to equivalent C++03 compatibility without too much work.

Note, this is a relatively naive approach to CSV parsing. It makes assumptions that may work for your CSV data, but might not work for all forms of CSV. For example, it doesn't deal with quoting of fields to allow for embedded commas within fields. Nor does it deal with backslash-escaped commas, nor a multitude of other CSV parsing challenges.

If your data-set is less well-behaved than this parser can deal with, it would behoove you to seek out a full fledged CSV parsing library rather than fiddling around with rolling your own parser. ...at least that's what I would do if I were tasked with parsing less trivial forms of CSV.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top