Why can I not convert this regex return to a string to an integer or straight to an integer? (C++)

StackOverflow https://stackoverflow.com/questions/21451570

Frage

I've read a few StackExchange posts and other pages on converting strings to integers, but this is not working. This is the last thing I tried:

if (infile.is_open())
{
        while (getline (infile,line))
        {

            regex_match(line,matches,exp);

            regex_match((string)matches[1], time0, exp_time);

            buffer << time0[1];
            str = buffer.str();

            str.append("\0");


            cout << atoi(str.c_str()) << '\n';

            last_match = matches[2];
            buffer.str(string());
        }
        infile.close();
}

I can't think of any other ways. I tried the normal convert to string to char * to integer. I tried converting it to a string then using stoi() to convert it to an integer. I tried appending a NULL character ("\0") to it, I tried appending it in the buffer, too. I also tried atof() and stof(). stoi() and stof() both crash the program. atoi() and atof() both return 0, always.


Here's an SSCCE, with the problem featured (atoi(str.c_str()) should not be 0):

#ifdef _MSC_VER
#define _CRT_SECURE_NO_WARNINGS
#endif

#include <iostream>
#include <fstream>
#include <string>
#include <regex>

#include <sstream>

using namespace std;



int main(int argc, char* argv[])
{
    regex exp("^(.+),(.+),.+,.+,(.+),.+,.+$");
    regex exp_time("^(.+)-(.+)-(.+)");
    smatch matches;
    smatch time0;
    string line;
    ifstream infile(argv[1]);
    string last_match;
    stringstream buffer;
    string str;


    int i = 0;

    if (infile.is_open())
    {
        while (getline(infile, line))
        {

            regex_match(line, matches, exp);

            regex_match((string)matches[1], time0, exp_time);

            buffer << time0[1];
            str = buffer.str();

            str = time0[1].str();
            str.append("\0");



            cout << atoi(str.c_str()) << " " << time0[1] << '\n';

            last_match = matches[2];
            buffer.str(string());
            i++;
        }
        infile.close();
    }

    return 0;
}

The input would be a csv file with these values:

1996-09-04,19.00,19.25,18.62,18.87,528000,0.79
1996-09-03,19.00,19.37,18.75,19.00,1012800,0.79
1996-08-30,19.87,20.12,19.37,19.62,913600,0.82
1996-08-29,20.87,21.12,19.75,19.75,1987200,0.82
1996-08-28,20.12,22.12,20.12,21.12,5193600,0.88
1996-08-27,19.75,20.37,19.75,20.12,1897600,0.84
1996-08-26,20.12,20.12,19.75,19.75,388800,0.82
1996-08-23,19.75,20.25,19.75,19.75,1024000,0.82
1996-08-22,18.62,20.00,18.25,19.87,1921600,0.83
1996-08-21,19.12,19.25,18.25,18.62,688000,0.78
1996-08-20,19.62,19.62,19.12,19.12,494400,0.80
1996-08-19,19.37,19.62,19.37,19.62,428800,0.82
1996-08-16,19.50,19.87,19.12,19.37,864000,0.81

You would run the program with program.exe filename.csv

Here's a shorter program with the problems more apparent:

War es hilfreich?

Lösung

Your problem is in this line:

regex_match((string)matches[1], time0, exp_time);

You can't pass a temporary as the subject string of a regex match, because the string contents have to still be around when you query the match results. The result of (string)matches[1] is destroyed at the end of the current full expression (i.e. at the next semicolon); when you get around to querying time0[1] on the next line, the time0 match is referring to a string that doesn't exist any more, which is undefined behaviour.

Andere Tipps

Let's understand it with an example: this is what's happening in my VS2012 environment:

enter image description here

There's an error in the buffer << time0[1]; line.

In that line I'm actually calling the std::ostream::operator<< by passing it the result of the std::match_results::operator[] which is a std::sub_match object reference.

That object can be converted to a string_type (an alias of the basic_string type being used with the characters being referred by the iterator type) since there's a conversion defined for it.

So I'm doing something:

buffer << (string with the contents of sub_match object).

At that point the string must exist and be valid. A rapid inspection with the debugger shows that something's missing:

enter image description here

the "first" field, which is an iterator to the beginning of the match, is missing. That iterator is a bidirectional iterator pointing to your string: so something must have happened to your string.

If you take a look at how (again, in a VS2012 environment) the regex_match function is defined:

template<class _StTraits,
    class _StAlloc,
    class _Alloc,
    class _Elem,
    class _RxTraits> inline
    bool regex_match(
        const basic_string<_Elem, _StTraits, _StAlloc>& _Str, <--- take a look here
        match_results<typename basic_string<_Elem, _StTraits, _StAlloc>::
            const_iterator, _Alloc>& _Matches,
        const basic_regex<_Elem, _RxTraits>& _Re,
        regex_constants::match_flag_type _Flgs =
            regex_constants::match_default)
    {   // try to match regular expression to target text
    return (_Regex_match(_Str.begin(), _Str.end(),
        &_Matches, _Re, _Flgs, true));
    }

it is clear that is taking a reference to a const basic_string, it's NOT copying it somehow nor r-value fiddling with it.

You can simulate the same behavior with the following code:

std::string::iterator myFirstElement; // every random-access iterator is a bidirectional iterator

void takeAReference(std::string& mystring)
{
  // Here mystring is valid!
  myFirstElement = mystring.begin();
}


int main(int argc, char* argv[])
{

  takeAReference(string("hello dear"));

  // Iterator is now NO MORE VALID! Try to inspect it / use it
  ....
}

and try it for yourself. On my machine this won't definitely work and even if it worked you can be sure that sooner or later it will disappoint you.

So that's the reason why you're having weird results. A simple solution could be to just extend your string's scope of visibility:

int main(int argc, char* argv[])
{
  regex exp("^(.+),(.+),.+,.+,(.+),.+,.+$");
  regex exp_time("^(.+)-(.+)-(.+)");
  smatch matches;
  smatch time0;
  string line;
  ifstream infile("testfile.txt");
  string last_match;
  stringstream buffer;
  string str;


  int i = 0;

  if (infile.is_open())
  {
    while (getline(infile, line))
    {

      regex_match(line, matches, exp);

      std::string first_date = (string)matches[1]; <--!!

      regex_match(first_date, time0, exp_time);

      buffer << time0[1];
      str = buffer.str();

      str = time0[1].str();
      str.append("\0");

      cout << atoi(str.c_str()) << " " << time0[1] << '\n';

      last_match = matches[2];
      buffer.str(string());
      i++;
    }
    infile.close();
  }

  return 0;
}

Are you sure your regex is matching what you want?

for example the regex "^(.+)-(.+)-(.+)$" would match the whole line in your example input file, like for example it matches the whole line:

1996-09-04,19.00,19.25,18.62,18.87,528000,0.79

because the .+ parts will just match anything (incl ,- chars etc.).

So if you want to match just 1996-09-04 then you could try the regex \d{4}-\d{1,2}-\d{1,2} or something like that. you can try out the regex in this online regex-tool

Also the other regex ^(.+),(.+),.+,.+,(.+),.+,.+$ looks suspicious to me, do really you want to match any line that has 6 commas with at least 1 char between them? Remember that the . is a very greedy regex.

UPDATE: I really think your first regex is too greedy, see example here

int atoi (const char * str);

Try using a char array instead of a string.

I think the KISS principle can be applied here to get a better solution than using regex. Simply read in each field using istream. Regex is overkill IMHO.

#include <iostream>
#include <string>
#include <fstream>
using namespace std;

struct date_t
{
  int year, month, day;
};

struct data_t
{
  date_t date;
  float f1, f2, f3, f4;
  int i;
  float f5;
};

istream & operator>>(istream & in, date_t &date)
{
  char d1, d2;  // dummy chars for the hyphens
  return in >> date.year >> d1 >> date.month >> d2 >> date.day;
}

istream & operator>>(istream & in, data_t &data)
{
  char d1, d2, d3, d4, d5, d6;  // dummy chars for the commas
  return in >> data.date >> d1 >> data.f1 >> d2 >> data.f2 >> d3
    >> data.f3 >> d4 >> data.f4 >> d5 >> data.i >> d6 >> data.f5;
}

ostream & operator<<(ostream & out, const date_t &date)
{
  return out << date.year << '-' << date.month << '-' << date.day;
}

ostream & operator<<(ostream & out, const data_t &data)
{
  return out << data.date << ',' << data.f1 << ',' << data.f2 << ','
    << data.f3 << ',' << data.f4 << ',' << data.i << ',' << data.f5;
}


int main(int argc, char* argv[])
{
  ifstream infile(argv[1]);

  data_t data;
  while(infile >> data) {
    cout << "Here is the data: " << data << endl;
  }

  infile.close();

  return 0;
}

Heck, iostream is kind of overkill too. Here is a C solution using fscanf.

#include <stdio.h>
#include <stdio.h>

struct date_t
{
  int year, month, day;
};

struct data_t
{
  struct date_t date;
  float f1, f2, f3, f4;
  int i;
  float f5;
};

int read_data(FILE *fid, struct data_t *data)
{
  return fscanf(fid, "%d-%d-%d,%f,%f,%f,%f,%d,%f",
      &(data->date.year), &(data->date.month), &(data->date.day),
      &(data->f1), &(data->f2), &(data->f3), &(data->f4), &(data->i), &(data->f5));
}

int main(int argc, char* argv[])
{
  FILE *fp = fopen(argv[1], "rt");

  struct data_t data;

  while(read_data(fp, &data) == 9) {
    printf("Here is your data: %d-%02d-%02d,%.2f,%.2f,%.2f,%.2f,%d,%.2f\n",
      data.date.year, data.date.month, data.date.day,
      data.f1, data.f2, data.f3, data.f4, data.i, data.f5);
  }

  return 0;
}

See how much shorter and easy to understand that is? The scanf format specifier can easily capture the format of your data, and it's much simpler to use than regex. Note that you don't have to split the data into tokens, and then parse each token. You get the parsed, numeric output right away.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top