Question

What are the unicode groups and block ranges that can be specified in character class \p{name}?

e.g.

\p{IsGreek}

Where Is the list of names & description available?

Was it helpful?

Solution

Regular-Expressions.info has lists.

You can also ask the man pages of PCRE itself:

Sets of Unicode characters are defined as belonging to certain scripts. A character from one of these sets can be matched using a script name. For example:

\p{Greek}
\P{Han}

Those that are not part of an identified script are lumped together as "Common". The current list of scripts is:

Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian, Lydian, Malayalam, Meetei_Mayek, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samaritan, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, Yi.

OTHER TIPS

Here you can find a list of the Unicode Character Properties that you can specify in the brackets: http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Categories

Or you can match Unicode Blocks or Scripts, you can find information about that here: http://www.regular-expressions.info/unicode.html#block and http://www.regular-expressions.info/unicode.html#script.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top