RegEx doesn't accept %
-
29-04-2021 - |
Domanda
What's wrong with this set of RegEx /^[\p{L}\p{N}]+/u
. When my senior entered % openminded The regex return false. I need it to accept this format
% openminded
100% openminded
openminded 100%
What do I need to add in the expression? So that it will accept the input even if the user entered %
at first or any special character.
Soluzione
The percent sign is not a \pS
symbol. It’s a \pP
punctuation, as explained by uniprops:
$ uniprops %
U+0025 ‹%› \N{PERCENT SIGN}
\pP \p{Po}
All Any ASCII Assigned Basic_Latin Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn POSIX_Graph POSIX_Print POSIX_Punct Print Punctuation X_POSIX_Graph X_POSIX_Print X_POSIX_Punct
You should familiarize yourself with the general category (and perhaps script) that your favorite characters belong to. Here’s some sample output from running unichars:
$ unichars -gs '[\pP\pS]' '\p{Block=Basic_Latin}'
U+0021 ! GC=Po SC=Common EXCLAMATION MARK
U+0022 " GC=Po SC=Common QUOTATION MARK
U+0023 # GC=Po SC=Common NUMBER SIGN
U+0024 $ GC=Sc SC=Common DOLLAR SIGN
U+0025 % GC=Po SC=Common PERCENT SIGN
U+0026 & GC=Po SC=Common AMPERSAND
U+0027 ' GC=Po SC=Common APOSTROPHE
U+0028 ( GC=Ps SC=Common LEFT PARENTHESIS
U+0029 ) GC=Pe SC=Common RIGHT PARENTHESIS
U+002A * GC=Po SC=Common ASTERISK
U+002B + GC=Sm SC=Common PLUS SIGN
U+002C , GC=Po SC=Common COMMA
U+002D - GC=Pd SC=Common HYPHEN-MINUS
U+002E . GC=Po SC=Common FULL STOP
U+002F / GC=Po SC=Common SOLIDUS
U+003A : GC=Po SC=Common COLON
U+003B ; GC=Po SC=Common SEMICOLON
U+003C < GC=Sm SC=Common LESS-THAN SIGN
U+003D = GC=Sm SC=Common EQUALS SIGN
U+003E > GC=Sm SC=Common GREATER-THAN SIGN
U+003F ? GC=Po SC=Common QUESTION MARK
U+0040 @ GC=Po SC=Common COMMERCIAL AT
U+005B [ GC=Ps SC=Common LEFT SQUARE BRACKET
U+005C \ GC=Po SC=Common REVERSE SOLIDUS
U+005D ] GC=Pe SC=Common RIGHT SQUARE BRACKET
U+005E ^ GC=Sk SC=Common CIRCUMFLEX ACCENT
U+005F _ GC=Pc SC=Common LOW LINE
U+0060 ` GC=Sk SC=Common GRAVE ACCENT
U+007B { GC=Ps SC=Common LEFT CURLY BRACKET
U+007C | GC=Sm SC=Common VERTICAL LINE
U+007D } GC=Pe SC=Common RIGHT CURLY BRACKET
U+007E ~ GC=Sm SC=Common TILDE
So either add the right general category to your class, like
[\pL\pN\p{Po}]
or just add the specific character you need. BTW, anything that wants \pL
almost always also wants \pM
, too.