Question

Is it possible to count how many times a substring appears in a string using regex matching with GNU libc regexec()?

Was it helpful?

Solution

No, regexec() only finds one match per call. If you want to find the next match, you have to call it again further along the string.

If you only want to search for plain substrings, you are much better off using the standard C string.h function strstr(); then you won't have to worry about escaping special regex characters.

OTHER TIPS

regexec returns in its fourth parameter "pmatch" a structure with all the matches. "pmatch" is a fixed sized structure, if there are more matches you will call the function another time.

I have found this code with two nested loops and I have modified it. The original cod you cand find it in http://www.lemoda.net/c/unix-regex/index.html:

static int match_regex (regex_t * r, const char * to_match)
{
    /* "P" is a pointer into the string which points to the end of the
       previous match. */
    const char * p = to_match;
    /* "N_matches" is the maximum number of matches allowed. */
    const int n_matches = 10;
    /* "M" contains the matches found. */
    regmatch_t m[n_matches];
    int number_of_matches = 0;
    while (1) {
        int i = 0;
        int nomatch = regexec (r, p, n_matches, m, 0);
        if (nomatch) {
            printf ("No more matches.\n");
            return nomatch;
        }
        for (i = 0; i < n_matches; i++) {
            if (m[i].rm_so == -1) {
                break;

            }
            number_of_matches ++;
        }
        p += m[0].rm_eo;
    }
    return number_of_matches ;
}

sorry for creating another answer, because I have not 50 reputation. I cannot comment @Oscar Raig Colon's answer.

pmatch cannot match all the substrings, pmatch is used to save the of offset for subexpression, the key is to understand what's subexpression, subexpression is "\(\)" in BRE, "()" in ERE. if there is not subexpression in entire regular expression, regexec() only return the first match string's offset and put it to pmatch[0].

you can find a example at [http://pubs.opengroup.org/onlinepubs/007908799/xsh/regcomp.html][1]

The following demonstrates how the REG_NOTBOL flag could be used with regexec() to find all substrings in a line that match a pattern supplied by a user. (For simplicity of the example, very little error checking is done.)

(void) regcomp (&re, pattern, 0);
/* this call to regexec() finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0);
while (error == 0) {    /* while matches found */
    /* substring found between pm.rm_so and pm.rm_eo */
    /* This call to regexec() finds the next match */
    error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top