What are the `unicode groups` and `block ranges` that can be specified in `\p{name}`?
-
19-04-2021 - |
Frage
What are the unicode groups
and block ranges
that can be specified in character class \p{name}
?
e.g.
\p{IsGreek}
Where Is the list of names & description available?
Lösung
Regular-Expressions.info has lists.
You can also ask the man pages of PCRE itself:
Sets of Unicode characters are defined as belonging to certain scripts. A character from one of these sets can be matched using a script name. For example:
\p{Greek} \P{Han}
Those that are not part of an identified script are lumped together as "Common". The current list of scripts is:
Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian, Lydian, Malayalam, Meetei_Mayek, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samaritan, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, Yi.
Andere Tipps
Here you can find a list of the Unicode Character Properties that you can specify in the brackets: http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Categories
Or you can match Unicode Blocks or Scripts, you can find information about that here: http://www.regular-expressions.info/unicode.html#block and http://www.regular-expressions.info/unicode.html#script.