Test if string contains only letters (a-z + é ü ö ê å ø etc..)

https://stackoverflow.com/questions/2013451

19-09-2019
|

Question

I want to match a string to make sure it contains only letters.

I've got this and it works just fine:

var onlyLetters = /^[a-zA-Z]*$/.test(myString);

BUT

Since I speak another language too, I need to allow all letters, not just A-Z. Also for example:

é ü ö ê å ø

does anyone know if there is a global 'alpha' term that includes all letters to use with regExp? Or even better, does anyone have some kind of solution?

Thanks alot

EDIT: Just realized that you might also wanna allow '-' and ' ' incase of a double name like: 'Mary-Ann' or 'Mary Ann'

Solution

I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following letters:

А ≠ A ≠ Α  # cyrillic, latin, greek

In such cases it’s better to specify the allowed letters manually if you want to minimise account faking and such.

Addition

Well, if it’s for a field which is supposed to be non-unique, I would allow greek as well. I wouldn’t feel well when I force users into changing their name to a latinised version.

But for unique fields like nicknames you need to give your other visitors of the site a hint, that it’s really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it’s something that depends on your users; but to be sure I think it’s better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)

As an untested suggestion (with ‘-’, ‘_’ and ‘ ’):

/^[a-zA-Z\-_ ’'‘ÆÐƎƏƐƔĲŊŒẞÞǷȜæðǝəɛɣĳŋœĸſßþƿȝĄƁÇĐƊĘĦĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊĲĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịĳĵķƙĸĺļłľŀŉńn̈ňñņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ]$/.test(myString)

Another edit: I have added the apostrophe for people with names like O’Neill or O’Reilly. (And the straight and the reversed apostrophe for people who can’t enter the curly one correctly.)

OTHER TIPS

var onlyLetters = /^[a-zA-Z\u00C0-\u00ff]+$/.test(myString)

You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with all possible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.

In a full fledged regex environment you could just test if the string matches \p{L}+. Here's a Java example:

boolean valid = string.matches("\\p{L}+");

Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains [A-Za-z]+ only. Here's again a Java example:

string = Normalizer.normalize(string, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = string.matches("[A-Za-z]+");

PHP supports similar functions.

When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:

// The huge unicode escape string is equal to ÆÐƎƏƐƔĲŊŒẞÞǷȜæðǝəɛɣĳŋœĸſßþƿȝĄƁÇĐƊĘĦ
// ĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎ
// ƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊ
// ĲĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịĳĵķƙĸĺļłľŀŉńn̈ňñ
// ņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭ
// ŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ

function isAlpha(string) {
    var patt = /^[a-zA-Z\u00C6\u00D0\u018E\u018F\u0190\u0194\u0132\u014A\u0152\u1E9E\u00DE\u01F7\u021C\u00E6\u00F0\u01DD\u0259\u025B\u0263\u0133\u014B\u0153\u0138\u017F\u00DF\u00FE\u01BF\u021D\u0104\u0181\u00C7\u0110\u018A\u0118\u0126\u012E\u0198\u0141\u00D8\u01A0\u015E\u0218\u0162\u021A\u0166\u0172\u01AFY\u0328\u01B3\u0105\u0253\u00E7\u0111\u0257\u0119\u0127\u012F\u0199\u0142\u00F8\u01A1\u015F\u0219\u0163\u021B\u0167\u0173\u01B0y\u0328\u01B4\u00C1\u00C0\u00C2\u00C4\u01CD\u0102\u0100\u00C3\u00C5\u01FA\u0104\u00C6\u01FC\u01E2\u0181\u0106\u010A\u0108\u010C\u00C7\u010E\u1E0C\u0110\u018A\u00D0\u00C9\u00C8\u0116\u00CA\u00CB\u011A\u0114\u0112\u0118\u1EB8\u018E\u018F\u0190\u0120\u011C\u01E6\u011E\u0122\u0194\u00E1\u00E0\u00E2\u00E4\u01CE\u0103\u0101\u00E3\u00E5\u01FB\u0105\u00E6\u01FD\u01E3\u0253\u0107\u010B\u0109\u010D\u00E7\u010F\u1E0D\u0111\u0257\u00F0\u00E9\u00E8\u0117\u00EA\u00EB\u011B\u0115\u0113\u0119\u1EB9\u01DD\u0259\u025B\u0121\u011D\u01E7\u011F\u0123\u0263\u0124\u1E24\u0126I\u00CD\u00CC\u0130\u00CE\u00CF\u01CF\u012C\u012A\u0128\u012E\u1ECA\u0132\u0134\u0136\u0198\u0139\u013B\u0141\u013D\u013F\u02BCN\u0143N\u0308\u0147\u00D1\u0145\u014A\u00D3\u00D2\u00D4\u00D6\u01D1\u014E\u014C\u00D5\u0150\u1ECC\u00D8\u01FE\u01A0\u0152\u0125\u1E25\u0127\u0131\u00ED\u00ECi\u00EE\u00EF\u01D0\u012D\u012B\u0129\u012F\u1ECB\u0133\u0135\u0137\u0199\u0138\u013A\u013C\u0142\u013E\u0140\u0149\u0144n\u0308\u0148\u00F1\u0146\u014B\u00F3\u00F2\u00F4\u00F6\u01D2\u014F\u014D\u00F5\u0151\u1ECD\u00F8\u01FF\u01A1\u0153\u0154\u0158\u0156\u015A\u015C\u0160\u015E\u0218\u1E62\u1E9E\u0164\u0162\u1E6C\u0166\u00DE\u00DA\u00D9\u00DB\u00DC\u01D3\u016C\u016A\u0168\u0170\u016E\u0172\u1EE4\u01AF\u1E82\u1E80\u0174\u1E84\u01F7\u00DD\u1EF2\u0176\u0178\u0232\u1EF8\u01B3\u0179\u017B\u017D\u1E92\u0155\u0159\u0157\u017F\u015B\u015D\u0161\u015F\u0219\u1E63\u00DF\u0165\u0163\u1E6D\u0167\u00FE\u00FA\u00F9\u00FB\u00FC\u01D4\u016D\u016B\u0169\u0171\u016F\u0173\u1EE5\u01B0\u1E83\u1E81\u0175\u1E85\u01BF\u00FD\u1EF3\u0177\u00FF\u0233\u1EF9\u01B4\u017A\u017C\u017E\u1E93]+$/;
    return patt.test(string);
}

This can be tricky, unfortunately JavaScript has pretty poor support for internationalization. To do this check you'll have to create your own character class. This is because for instance, \w is the same as [0-9A-Z_a-z] which won't help you much and there isn't anything like [[:alpha:]] in Javascript. But since it sounds like you're only going to use one other langauge you can probably just add those other characters into your character class.

By the way, I think you'll need a ? or * in your regexp there if myString can be longer than one character.

The full example,

/^[a-zA-Zéüöêåø]*$/.test(myString);

There should be, but the regex will be localization dependent. Thus, é ü ö ê å ø won't be filtered if you're on a US localization, for example. To ensure your web site does what you want across all localizations, you should explicitly write out the characters in a form similar to what you are already doing.

The only standard one I am aware of though is \w, which would match all alphanumeric characters. You could do it the "standard" way by running two regex, one to verify \w matches and another to verify that \d (all digits) does not match, which would result in a guaranteed alpha-only string. Again, I'd strongly urge you not to use this technique as there's no guarantee what \w will represent in a given localization, but this does answer your question.

I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it ([\u0300-\u036f\u1dc0-\u1dff]). Then your letters will only be ASCII ones.

You could aways use a blacklist instead of a whitelist. That way you only remove the characters you do not need.

You could use a blacklist - a list of characters to exclude.

Also, it is important to verify the input on server-side, not only on client-side! Client-side can be bypassed easily.

There are some shortcuts to achive this in other regular expression dialects - see this page. But I don't believe there are any standardised ones in JavaScript - certainly not that would be supported by all browsers.

I'm using a convertor before checking, but it's still not friendly for all languages. I'm not sure that's possible.

function noExtendedChars( input_name ){

    var whitelist = [
        ['a',  'à','á','â','ä','æ','ã','å','ā'],
        ['c',  'ç', 'ć', 'č'],
        ['e',  'è','é','ê','ë','ē','ė','ę'],
        ['i',  'ï','ï','í','ī','į','î'],
        ['l',  'ł'],
        ['n',  'ñ', 'ń'],
        ['o',  'ô', 'ö', 'ò', 'ó', 'œ', 'ø', 'ō', 'õ' ],
        ['s',  'ß', 'ś', 'š' ],
        ['u',  'û', 'ü', 'ù', 'ú', 'ū'],
        ['y',  'ÿ'],
        ['z',  'ž', 'ź', 'ż']
        ];

    for( b=0; b < blacklist.length; b++ ){
        var r=  blacklist[b];
        for ( a=1; a < r.length; a++ ){
            input_name = input_name.replace( new RegExp( r[a], "gi") , r[0]);
        }
    }
    return input_name;

}

var regexp = /\B\#[a-zA-Z\x7f-\xff]+/g; 
var result = searchText.match(regexp);

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow