Question

I want to insert the Arabic letters in the pattern just like the English letters

pattern="[a-zA-Z0-9-_. ]{1,30}"

I have no idea how to accomplish the action.

Was it helpful?

Solution

The range for Arabic and Persian are shared so this code could be used for Arabic too.

[أ-يa-zA-Z]

This is the reference for finding the character range of Unicode languages:

preg_replace and preg_match arabic characters

http://unicode.org/charts/

OTHER TIPS

The HTML5 pattern attribute follows JavaScript regular expression syntax, which makes things rather awkward. You cannot test character properties, for example. Instead, you need to list down the allowed characters or ranges of characters.

Using the normative Scripts.txt file (by the Unicode Consortium), which defines the script (writing system) of all characters, I constructed the following:

pattern=
"[a-zA-Z0-9-_. \
\u0620-\u063F\u0641-\u064A\u066E-\u066F\u0671-\u06D3\u06D5\
\u06E5-\u06E6\u06EE-\u06EF\u06FA-\u06FC\u06FF\u0750-\u077F\
\u08A0\u08A2-\u08AC\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFD8F\
\uFD92-\uFDC7\uFDF0-\uFDFB\uFE70-\uFE74\uFE76-\uFEFC]{1,30}"

Starting from the set of all characters with script defined to be Arabic, I picked up those that are declared as letters (General Category Lo or Lm), and then omitted those beyond BMP, the Basic Multilingual Plane.

Characters outside BMP are used very rarely, and to represent them in JavaScript syntax, you would need to either include the characters themselves or use two \u notations per character (one for each component of a surrogate pair). This does not sound realistic.

This is of course a “hardwired” solution: it may need updates if new Arabic letters are added to Unicode or the script of a character is changed from or to Arabic (which is highly unlikely). But I don’t expect to see new Arabic letters added to BMP during my lifetime.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top