Assign values from an input file efficiently

https://stackoverflow.com/questions/23559651

18-07-2023
|

Question

I am getting an input file with a series of data blocks of the following format.

CO2               Gurvich,1991 pt1 p27 pt2 p24.                                 
       3 g 9/99 C   1.00O   2.00    0.00    0.00    0.00 0     44.00950    -393510.000
          200.000  1000.000 7 -2.0 -1.0  0.0  1.0  2.0  3.0  4.0  0.0         9365.469
       4.943650540E+04-6.264116010E+02 5.301725240E+00 2.503813816E-03-2.127308728E-07
      -7.689988780E-10 2.849677801E-13 0.000000000E+00-4.528198460E+04-7.048279440E+00
         1000.000  6000.000 7 -2.0 -1.0  0.0  1.0  2.0  3.0  4.0  0.0         9365.469
       1.176962419E+05-1.788791477E+03 8.291523190E+00-9.223156780E-05 4.863676880E-09
      -1.891053312E-12 6.330036590E-16 0.000000000E+00-3.908350590E+04-2.652669281E+01
         6000.000 20000.000 7 -2.0 -1.0  0.0  1.0  2.0  3.0  4.0  0.0         9365.469
      -1.544423287E+09 1.016847056E+06-2.561405230E+02 3.369401080E-02-2.181184337E-06
       6.991420840E-11-8.842351500E-16 0.000000000E+00-8.043214510E+06 2.254177493E+03

They represent certain values for chemical reactions. (In this case Carbon Dioxide). I need to extract certain values based in the character position, each line has 80 characters. And they have different meanings.

Explaining a little more, In the First line, the first 16 characters give me the Species name or formula (CO2) Then, from char 19 to 80 are notes. In the Second line, character 1-2 gives certain values, character 4-9 other values, and so on...

For line 3, characters 1-22 give temperature ranges, and I need to divide the values into different variables. So,

         200.000  1000.000

need to become "double V1=200.000" and "double V2=1000.000" then Character 23 is always 7, BUT sometimes there is no space between the values on Characters 1-22 and 23. And so on... Well... My main question is, what can be a good approach to solve this? I was thinking on dividing each line into different char variables and assign the values from the input file to them. But I am not sure if that is a good approach.

Also, the format of line 3, 4 and 5 repeats differently for each block of information.

I hope the question is clear and I am not writing a bad question. I don't really need a code answer, just pointing me to the correct direction. Thanks!

Solution

If data is fixed width, then splitting at the right place is really quite easy. Something along these lines:

  std::string input;
  int lineno = 0;
  std::string compound;
  std::vector<double> data;

  while(std::getline(cin, input))
  {
     if (input[0] != ' ')    // Detect new "first line"
     {
        lineno = 0; 
     }
     else
     {
        lineno ++;
     }
     switch(lineno)
     {
        case 0: 
        {
            if (data.size() != 0) 
            {
                // Save "data" from previous "chunk". 
            }
            data.clear();
            int i;
            for(i = 0; i < input.size() && input[i] != ' '; i++);
            compound = input.substr(0, i); 
            // May want to keep comment too: comment = input.substr(i);
            // You would have to strip extra spaces. 
        }
        break;

        case 1:
        case 2: 
        {
             // Not sure what you want to do here, as I don't 
             // know how the data is grouped. But should roughly follow
             // the "default" variety. 
        }
        break;

        default:
        {
            const int fieldsize = 16;

            for(int i = 6; i < input.size(); i += fieldsize)
            {
                data.push_back(std::stod(input.substr(i, fieldsize))); 
            }
        }
        break;
    }
 }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow