Question

I am trying to find out how many regex matches are in a string. I'm using an iterator to iterate the matches, and and integer to record how many there were.

long int before = GetTickCount();
string text;

boost::regex re("^(\\d{5})\\s(\\d{8})\\s(.*)\\s(.*)\\s(.*)\\s(\\d{8})\\s(.{1})$");
char * buffer;
long length;
long count;
ifstream f;


f.open("c:\\temp\\test.txt", ios::in | ios::ate);
length = f.tellg();
f.seekg(0, ios::beg);

buffer = new char[length];

f.read(buffer, length);
f.close();

text = buffer;
boost::sregex_token_iterator itr(text.begin(), text.end(), re, 0);
boost::sregex_token_iterator end;

count = 0;
for(; itr != end; ++itr)
{
    count++;
}

long int after = GetTickCount();
cout << "Found " << count << " matches in " << (after-before) << " ms." << endl;

In my example, count always returns 1, even if I put code in the for loop to show the matches (and there are plenty). Why is that? What am I doing wrong?

Edit

TEST INPUT:

12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N

OUTPUT (without matches):

Found 1 matches in 16 ms.

If I change the for loop to this:

count = 0;
for(; itr != end; ++itr)
{
    string match(itr->first, itr->second);
    cout << match << endl;
    count++;
}

I get this as output:

12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
12345 12345678 SOME NAME SOMETHING 88888888 N
Found 1 matches in 47 ms.
Was it helpful?

Solution

Heh. Your problem is your regex. Change your (.\*)s to (.\*?)s (assuming that's supported). You think you're seeing each line being matched, but in fact you're seeing the entire text being matched because your pattern is greedy.

To see the issue illustrated, change the debug output in your loop to:

cout << "[" << match << "]" << endl;

OTHER TIPS

Don't know much about boost, but does (end - itr) work?

Since you're saying that even when you output the results, the count is still one, you might look at a couple things to help diagnose it:

  • Try outputting count each loop iteration and see what happens. If this only outputs once, then the loop is only running once, and what you thought were multiple matches were really one big long match.
  • If that works, try using another variable name entirely: it's possible that you are getting some scope shadowing where you have declared more than one count variable.

If that loop is executing multiple times, then the problem is not in how you are using boost. No matter what you are doing, boost does not have the ability to modify a variable that you don't pass to it. (Of course if you are passing count in to boost somewhere, then that's another possiblity.)

With all likelyhood, the first (.*) you have is matching everything up until nearly the end of the input (newlines included). Try replacing those with ([^ ]*) (anything but a space, so the matching stops when it finds a space.

Can you paste the input and also the output.

If count returns 1, that means there is only one match in your string text.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top