RegExp: How to exclude whitespace
-
20-06-2021 - |
Question
Considering Ruby 1.8.7 or Javascript.
I have the following string: (GMT+02:00) Istanbul
and I want to capture everything after the )
(note the whitespace included after the close parentheses)
The regexp I have created is almost working with exception it is including the undesired whitspace.
\s\D*
=> Istanbul
How can I fix that and is this regexp for doing this?
EDIT
The string can be others, something like (GMT+01:00) West Central Africa
In this case, I want West Central Africa
So, some answers will not work.
Sorry, I have forgot to mention that.
Thanks.
Solution
In Ruby:
irb> line = '(GMT+01:00) West Central Africa'
irb> line.sub(/^.*\)\s/, '')
=> "West Central Africa"
In JavaScript:
js> var line = '(GMT+01:00) West Central Africa'
js> line.replace(/^.*\)\s/, '')
West Central Africa
OTHER TIPS
Positive Look-behind Assertion is one option.
(?<=\s)[\D]+
(tested with python regex lib)
To extract the first word after a GMT offset definition like the one in your example...
(?<=\([\D]{3}[\+\-][\d]{2}:[\d]{2}\)\s)[\D]+
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
Do the following:
\S+$
It matches everything that is not a space at the end of the line.
If you want to match only alphabetical character, you can use the following:
\w+$
You can test them here.
When you say capture, if you wish to get a named capture, and ignore the rest, you can do the follwing:
(?:.+\s)(?<Country>.+)
very simple expression:
[^)]*\)\s*(\w+)
explain:
[^)]* any character except: ')'
(0 or more times, matching the most amount possible)
\) ')'
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times, matching the most amount possible)
\w+ word characters (a-z, A-Z, 0-9, _)
(1 or more times, matching the most amount possible)