Question

In my project, users can register with a publicly viewable nickname. I would like to allow that name to contain characters from any script (arabic, latin, cyrillic, japanese, etc) but prevent control characters, punctuation, and non-alphabetic characters such as ✇ or ✈.

I've found a lot of examples for filtering alphanumeric characters from various individual scripts, but I don't want to have to spend days digging through encoding tables to try and allow every script through manually.

Any recommendations?

Was it helpful?

Solution

In JavaScript, when you want to deal with Unicode in regular expressions, the usual solution is to give up.

The next most usual solution is to use xregexp which does happen to have the classes you seem to need :

var unicodeWord = XRegExp('^\\p{L}+$');
unicodeWord.test('Русский'); // -> true
unicodeWord.test('日本語'); // -> true
unicodeWord.test('العربية'); // -> true

OTHER TIPS

I've used \p{Latin} before in Perl to select all Latin characters. There is a whole list of options about half-way down on this page: http://www.regular-expressions.info/unicode.html.

It seems that this could carry over to Javascript since it uses XRegExp.

Edit 2: OR - make up a list of NON-allowed characters to check against - then \p{common} would be a starting point.

Edit: apparently my memory of doing this is from many eons ago. I cannot get it to work with my current Perl build (which is a special case). So - it may be completely off-base.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top