Regex for beginning of line -or- last number in sentance
-
13-06-2021 - |
Question
new Regex(@"^[a-zA-Z]+\b +\b[a-zA-Z]?\b +\b[a-zA-Z]+$")
this matches
John Smith
John B Goode
I am trying to modify this regex for the following cases:
some text before 12359 (John B? Goode) 10249?
that is sometimes the name comes after the a number at the end of the string and optionally before a final number at the end.
I have tried
new Regex(@"^|[0-9]+([a-zA-Z]+\b +\b[a-zA-Z]?\b +\b[a-zA-Z]+) *[0-9]*?$")
but that does not work because
- the
^|[0-9]+
only matches numbers anymore and not beginning of line - the group is always an empty string that match something like
sometext 12354
(the first number needs to not be at the end of a line.
Update
This is all water under the bridge because I found more names at the end of the lines of data so this will not work.
However the solution to my problem was not throwing the OR in a group.
Solution
You need parentheses around the alternation:
(^|[0-9]+)
Your expression is equivalent to this:
new Regex(@"^|()")
It always matches the start of the string and nothing else.
OTHER TIPS
Edit (re Alan Moore's info)
Another try. The problem statement is unclear as to if you're going for a FULL valdation or just trying to extract the name, validating just the surrounding extraction.
If attempting a %100 validation extraction, then you should be concerned about the BOL.
Otherwise, you only need to worry about the EOL.
For %100 validation:
(?:^|[0-9]+\ +)([a-zA-Z]+\ +(?:[a-zA-Z]\ +)?[a-zA-Z]+)(?:\ +[0-9]+)?$
Expanded:
(?: ^ # BOL
| [0-9]+ \ + # or, leading numbers + space
)
( # Capt 1
[a-zA-Z]+ # first name
\ + # space
(?: [a-zA-Z] \ + )? # optional middle initial + space
[a-zA-Z]+ # last name
) # End Capt 1
(?: \ + [0-9]+ )? # optional space + trailing numbers
$ # EOL
Or, if you just want to extract the text, only the EOL anchor is needed and some restrictions can be loosened:
\b([a-zA-Z](?:\s+[a-zA-Z.]+)*)[\s\d]*$