Вопрос

In my project, users can register with a publicly viewable nickname. I would like to allow that name to contain characters from any script (arabic, latin, cyrillic, japanese, etc) but prevent control characters, punctuation, and non-alphabetic characters such as ✇ or ✈.

I've found a lot of examples for filtering alphanumeric characters from various individual scripts, but I don't want to have to spend days digging through encoding tables to try and allow every script through manually.

Any recommendations?

Это было полезно?

Решение

In JavaScript, when you want to deal with Unicode in regular expressions, the usual solution is to give up.

The next most usual solution is to use xregexp which does happen to have the classes you seem to need :

var unicodeWord = XRegExp('^\\p{L}+$');
unicodeWord.test('Русский'); // -> true
unicodeWord.test('日本語'); // -> true
unicodeWord.test('العربية'); // -> true

Другие советы

I've used \p{Latin} before in Perl to select all Latin characters. There is a whole list of options about half-way down on this page: http://www.regular-expressions.info/unicode.html.

It seems that this could carry over to Javascript since it uses XRegExp.

Edit 2: OR - make up a list of NON-allowed characters to check against - then \p{common} would be a starting point.

Edit: apparently my memory of doing this is from many eons ago. I cannot get it to work with my current Perl build (which is a special case). So - it may be completely off-base.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top