BEGINNER: REGEX Match numeric sequence except where the word “CODE” exists on a line

StackOverflow https://stackoverflow.com/questions/1955237

  •  21-09-2019
  •  | 
  •  

Question

I've been able to stumble my way through regular expressions for quite some time, but alas, I cannot help a friend in need.

My "friend" is trying to match all lines in a text file that match the following criteria:

  1. Only a 7 to 10 digit number (0123456 or 0123456789)
  2. Only a 7 to 10 digit number, then a dash, then another two digits (0123456-01 or 0123456789-01)
  3. Match any of the above except where the words Code/code or Passcode/passcode is before the numbers to match (Such as "Access code: 16434629" or "Passcode 5253443-12")
  4. EDIT: Only need the numbers that match, nothing else.

Here is the nastiest regex I have ever seen that "he" gave me:

^(?=.*?[^=/%:]\b\d{7,10}((\d?\d?)|(-\d\d))?\b)((?!Passcode|passcode|Code|code).)*$

...

Question: Is there a way to use a short regex to find all lines that meet the above criteria?

Assume PCRE. My friend thanks you in advance. ;-)

BTW - I have not been able to find any other questions listed in stackoverflow.com or superuser.com which can answer this question accurately.

EDIT: I'm using Kodos Python Regex Debugger to validate and test the regex.

Was it helpful?

Solution

(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?

Commented version:

(?<!                 # Begin zero-width negative lookbehind. (Makes sure the following pattern can't match before this position)
(?:                  # Begin non-matching group
[Pp]asscode          # Either Passcode or passcode
|                    # OR
[Cc]ode              # Either Code or code
)                    # End non-matching group
.*                   # Any characters
)                    # End lookbehind
[0-9]{7,10}          # 7 to 10 digits
(?:                  # Begin non-matching group
-[0-9]{2}            # dash followed by 2 digits
)                    # End non-matching group
?                    # Make last group optional

Edit: final version after comment discussion -

/^(?!\D*(?:[Pp]asscode|[Cc]ode))\D*([0-9]{7,10}(?:-[0-9]{2})?)/

(result in first capture buffer)

OTHER TIPS

You can get by with a nasty regex you have to get help with ...

... or you can use two simple regexes. One that matches what you want, and one that filters what you don't want. Simpler and more readable.

Which one would you like to read?

$foo =~ /(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?/

or

$foo =~ /\d{7,10}(-\d{2})?/ and $foo !~ /(access |pass)code/i;

Edit: case-insensitivity.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top