Question

Currently i'm trying to create a regex, that can match 3 numbers under some given circumstances. I've tried various attempts now, but it won't work with a single Expression - it's either "false positive" or "matching the wrong numbers"...

In words: I want to match ANY 3 digits that are

  • Appearing at the start of a string
  • Appering somewhere inside the string
  • (End of the string is NOT possible)

IF:

  • There is not another 3-digit-group matching this condition. (ambigious)
  • The group is not followed by "p" or "i"
  • The group is not lead by "x"

In Examples (the number in () is what i want to match):

  • This is (321) an example.
  • (321) also
  • including (321) //basically not possible, but can't hurt.
  • this (321) has another group with a p: 122p
  • this (321) has another group with a I: 123i
  • this x235 should be ignored cause (123) is what i want to match.
  • (123) is what i want, not x111 or 125p or 999i
  • in this 111 case there is no solution 555

(I need it like (1 number)(2 numbers) - but that would just be a little modification to a 3 number match)

My last attempt looked like this:

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

However it fails on the last case. I tried to cover this with preg_match_all(...) === 1 to make sure, only one result is matched

However, now a teststring like "101 202" will be positive, because the first check matches 101 (including the whitespace) and then does not match on 202, which makes the pattern assume that 101 is the only valid solution - which is wrong.

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

Any idea?

Note: It should work accross different regex engines, no matter if php, javascript, java, .net or Ook! :)

Was it helpful?

Solution 2

We can write the numbers you are looking for like this:

re_n = (?:[^x]|^)\d\d\d(?:[^ip]|$)

Then the whole expression is:

^(?!.*re_n.*re_n.*$).*(re_n)

which basically eliminates double numbers using a negative lookahead following the line start anchor, then matches a valid number.

The interpolated expression looks ugly:

/^(?!.*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*$).*((?:(?:[^x]|^)\d\d\d(?:[^ip]|$)))/

This Perl code:

my $re_n = qr/(?:[^x]|^)\d\d\d(?:[^ip]|$)/;
while (<DATA>) { chomp;
    if (/^(?!.*$re_n.*$re_n.*$).*($re_n)/) {
        print "$_: $1\n";
    } else {
        print "$_: NONE\n";
    }   
}

__DATA__
This is 321 an example.
321 also
including 321 //basically not possible, but can't hurt.
this 321 has another group with a p: 122p
this 321 has another group with a I: 123i
this x235 should be ignored cause 123 is what i want to match.
123 is what i want, not x111 or 125p or 999i
in this 111 case there is no solution 555

Produces:

This is 321 an example.:  321 
321 also: 321 
including 321 //basically not possible, but can't hurt.:  321 
this 321 has another group with a p: 122p:  321 
this 321 has another group with a I: 123i:  321 
this x235 should be ignored cause 123 is what i want to match.:  123 
123 is what i want, not x111 or 125p or 999i: 123 
in this 111 case there is no solution 555: NONE

OTHER TIPS

I'm not sure if it's this that you want, give it a try:

JAVASCRIPT

var myregexp = /(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/m;

http://regex101.com/r/jY6mG9

PHP

preg_match_all('/(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/m', $code, $result, PREG_PATTERN_ORDER);

http://regex101.com/r/oW1tJ7

JAVA

Pattern regex = Pattern.compile("(?:\\b[\\s]?|[^x])([\\d]{1}[\\d]{2})(?:[^pi]|[\\s]?\\b)", Pattern.MULTILINE);

RUBY

regexp = /(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/

http://rubular.com/r/OHgMLS2gGs

PYTHON

reobj = re.compile(r"(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)", re.MULTILINE)

https://pythex.org

C (PCRE)

myregexp = pcre_compile("(?:\\b[\\s]?|[^x])([\\d]{1}[\\d]{2})(?:[^pi]|[\\s]?\\b)", PCRE_MULTILINE, &error, &erroroffset, NULL);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top