Frage

I have the following code that matches a REGEX in a string that contains multiple duplicates, what I want to do is to print out only unique matches, what can I do? Add to an array than make it unique and only then print out the results? Thanks!

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <pcre.h>

int main() {
  pcre *myregexp;
  const char *error;
  int erroroffset;
  int offsetcount;
  int offsets[(0+1)*3]; // (max_capturing_groups+1)*3
  const char *result;
  char *subject = "9,5,3,2,5,6,3,2,5,6,3,2,2,2,5,0,5,5,6,6,1,";
  myregexp = pcre_compile("\\d,", PCRE_MULTILINE|PCRE_DOTALL|PCRE_NEWLINE_ANYCRLF, &error, &erroroffset, NULL);

  if (myregexp != NULL) {
    offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, (0+1)*3);

    while (offsetcount > 0) {

      if (pcre_get_substring(subject, offsets, offsetcount, 0, &result) >= 0) {
        printf("%s\n", result);
      }

      offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), offsets[1], 0, offsets, (0+1)*3);
    }

  } else {
      printf("Syntax error in REGEX at erroroffset\n");
  }

}

This outputs:

bash$ ./regex
9,
5,
3,
2,
5,
6,
3,
2,
5,
6,
3,
2,
2,
2,
5,
0,
5,
5,
6,
6,
1,

and I need:

bash$ ./regex
0,
1,
2,
3,
5,
6,
9,
War es hilfreich?

Lösung

Yes, add to an array and deduplicate from there.

You can not search unique values with regex. You can search replace with regex and deduplicate some things like double new lines, multiple spaces and so on, but this doesn't work when the deduplication needs to occur using random seeking.

Here is an example of how to deduplicate: a -> b

#include <stdio.h>
#include <string.h>    
main()
{
    char *a[5];
    int a_len = 5;

    a[0] = "a";
    a[1] = "b";
    a[2] = "b";
    a[3] = "a";
    a[4] = "c";

    char *b[a_len];
    int b_len = 0;

    int already_exists;
    int i, j;
    for (i = 0; i < a_len; i++) 
    {
        already_exists = 0;
        for ( j = 0; j < b_len; j++)
        {
            if (!strcmp(a[i], b[j]))
            {
                already_exists = 1;
                break;
            }
        }

        if (!already_exists)
        {
            b[b_len] = a[i];
            b_len++;
        }
    }

    for (i = 0; i < b_len; i++) 
    {
        printf("%s", b[i]);
    }
}

For these small arrays this is probably the fastest algorithm. For better performance on bigger arrays I would suggest deduplication on a sorted array.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top