Question

What's wrong with this set of RegEx /^[\p{L}\p{N}]+/u. When my senior entered % openminded The regex return false. I need it to accept this format

% openminded
100% openminded
openminded 100%

What do I need to add in the expression? So that it will accept the input even if the user entered % at first or any special character.

Was it helpful?

Solution

The percent sign is not a \pS symbol. It’s a \pP punctuation, as explained by uniprops:

$ uniprops %
U+0025 ‹%› \N{PERCENT SIGN}
    \pP \p{Po}
    All Any ASCII Assigned Basic_Latin Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn POSIX_Graph POSIX_Print POSIX_Punct Print Punctuation X_POSIX_Graph X_POSIX_Print X_POSIX_Punct

You should familiarize yourself with the general category (and perhaps script) that your favorite characters belong to. Here’s some sample output from running unichars:

$ unichars -gs '[\pP\pS]' '\p{Block=Basic_Latin}'
U+0021 ‭ !  GC=Po SC=Common       EXCLAMATION MARK
U+0022 ‭ "  GC=Po SC=Common       QUOTATION MARK
U+0023 ‭ #  GC=Po SC=Common       NUMBER SIGN
U+0024 ‭ $  GC=Sc SC=Common       DOLLAR SIGN
U+0025 ‭ %  GC=Po SC=Common       PERCENT SIGN
U+0026 ‭ &  GC=Po SC=Common       AMPERSAND
U+0027 ‭ '  GC=Po SC=Common       APOSTROPHE
U+0028 ‭ (  GC=Ps SC=Common       LEFT PARENTHESIS
U+0029 ‭ )  GC=Pe SC=Common       RIGHT PARENTHESIS
U+002A ‭ *  GC=Po SC=Common       ASTERISK
U+002B ‭ +  GC=Sm SC=Common       PLUS SIGN
U+002C ‭ ,  GC=Po SC=Common       COMMA
U+002D ‭ -  GC=Pd SC=Common       HYPHEN-MINUS
U+002E ‭ .  GC=Po SC=Common       FULL STOP
U+002F ‭ /  GC=Po SC=Common       SOLIDUS
U+003A ‭ :  GC=Po SC=Common       COLON
U+003B ‭ ;  GC=Po SC=Common       SEMICOLON
U+003C ‭ <  GC=Sm SC=Common       LESS-THAN SIGN
U+003D ‭ =  GC=Sm SC=Common       EQUALS SIGN
U+003E ‭ >  GC=Sm SC=Common       GREATER-THAN SIGN
U+003F ‭ ?  GC=Po SC=Common       QUESTION MARK
U+0040 ‭ @  GC=Po SC=Common       COMMERCIAL AT
U+005B ‭ [  GC=Ps SC=Common       LEFT SQUARE BRACKET
U+005C ‭ \  GC=Po SC=Common       REVERSE SOLIDUS
U+005D ‭ ]  GC=Pe SC=Common       RIGHT SQUARE BRACKET
U+005E ‭ ^  GC=Sk SC=Common       CIRCUMFLEX ACCENT
U+005F ‭ _  GC=Pc SC=Common       LOW LINE
U+0060 ‭ `  GC=Sk SC=Common       GRAVE ACCENT
U+007B ‭ {  GC=Ps SC=Common       LEFT CURLY BRACKET
U+007C ‭ |  GC=Sm SC=Common       VERTICAL LINE
U+007D ‭ }  GC=Pe SC=Common       RIGHT CURLY BRACKET
U+007E ‭ ~  GC=Sm SC=Common       TILDE

So either add the right general category to your class, like

 [\pL\pN\p{Po}]

or just add the specific character you need. BTW, anything that wants \pL almost always also wants \pM, too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top