Matching Unicode control characters except for three with Regular Expressions
-
23-08-2019 - |
Question
I would need to get a Regular Expression, which matches all Unicode control characters except for carriage return (0x0d), line feed (0x0a) and tabulator (0x09). Currently, my Regular Expression looks like this:
/\p{C}/u
I just need to define these three exceptions now.
Solution
I think you can use a negative lookahead here, combined with character classes.
/(?![\x{000d}\x{000a}\x{0009}])\p{C}/u
What this does is use a negative lookahead to assert that the character is not one of those specified in the character class. Then it traverses the character again to match it with any control character.
I used the perl syntax for specifying single unicode points.
More discussion on lookarounds here
(Note that this has not been tested, but I think the concept is correct.)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow