Question

I have a string

static const char *str = ",<ABC$>EEEEEEEE$>,EEE<$WQWERRERR<$ABC>,2233My Name Is $Tom,<ABC$>$>,TTTTTTTEEE<$WQWERRERR<$ABC>,2233My Name Is Jake,<ABC$>$>";

I want to get all text in between <ABC$>, header and ,<ABC$> footer.

Now if I am using expression

static const char *regex = "<\\$ABC>,([^>]*),<ABC\\$>";

Notice that I am using [^>] to make it non-greedy.

and printing it using

 struct slre_cap caps[2];
    int i, j = 0, str_len = strlen(str);

    while (j < str_len &&
       (i = slre_match(regex, str + j, str_len - j, caps, 2)) > 0) {
      printf("Found Data: [%.*s]\n", caps[0].len, caps[0].ptr);
      j += i;
    }

My output looks like

 Found Data: [2233My Name Is $Tom]
 Found Data: [2233My Name Is Jake]

Everything is fine until this point, but now to add a curve ball I introduced a > (which a binary serialization protocol e.g. protobuf-embedded-c that I plan to use later may introduce) character in between the header and the footer e.g. <$ABC>,2233My Name Is> $Tom,<ABC$>. Now suddenly, the result is different. Thus I was wondering if there is a way to instead negate the word ABC rather than a single character > for non-greedy approach. Other regex libraries do this using negative lookahead/lookbehind to my knowledge, which is not available in SLRE. Is it possible to simulate this behavior in SLRE? Thanks.

Was it helpful?

Solution

Maybe regular expressions are not the right tool here. You can find your subexpressions simply by searching for your fixed delimiting strings:

const char *p = str;

for (;;) {
    const char *q, *r;
    int len;

    r = strstr(p, "<$ABC>");
    if (r == NULL) break;
    r += 6;

    q = strstr(r + 6, "<ABC$>");
    if (q == NULL) break;

    len = q - r;
    printf("'%.*s'\n", len, r);
    p = q + 6;
}

Edit: As pointed out in the comments, the captured string between delimiters must of course start after the beginning delimiter, but strstr finds its beginning. Hence, the length of the delimiting string, here 6, must be added to r.

In the same way, if the captured string should include the delimiters, r stays as it is after the strstr search and the length of the end delimiter must be added to q:

    r = strstr(p, "<$ABC>");
    if (r == NULL) break;

    q = strstr(r + 6, "<ABC$>");
    if (q == NULL) break;
    q += 6;

    len = q - r;
    printf("'%.*s'\n", len, r);
    p = q;

OTHER TIPS

Replace your ([^>]*) with (.*?). dot(.) means any character. Whereas in your code [^>] means any character other than a >.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top