Question

How to check in C++ if a character is a letter of some alphabet? Generally I need something like this:

bool is_german(wchar_t ch);
bool is_russian(wchar_t ch);
bool is_japanese(wchar_t ch);

and etc.

EDIT 1. Can I do it without defining all charachter sets of all languages I need. Or maybe there is some library which has somethis like this:

std::vector alphabet = GetEnglishAlphabet(); // alphabet = {L'a', L'b', L'c', ...}

EDIT 2. If someone is interested in I've found

Script QChar::script() const

Was it helpful?

Solution 2

You can use std::isalpha defined in <locale>. Remember to set to the correct locale first http://www.cplusplus.com/reference/locale/isalpha/

EDIT:

std::locale loc("en-US");
bool isAlpha1 = std::isalpha('a', loc);
bool isAlpha2 = std::isalpha('&', loc);
bool isAlpha3 = std::isalpha('1', loc);
bool isAlpha4 = std::isalpha('Ж', loc); //cyrilic alphabet, but not US

You can find Language strings here:

http://msdn.microsoft.com/en-us/library/39cwe7zf.aspx

http://msdn.microsoft.com/en-US/goglobal/bb896001.aspx

OTHER TIPS

For a roll-your-own solution, I would generally expect something like this:

vector<wchar_t> german = {... german chars ...};
vector<wchar_t> japanese = {... japanese chars ...};
vector<wchar_t> russian = {... russian chars ...};

bool is_in_alphabet(const vector<wchar_t>& language, wchar_t candidate) {
   return std::find(language.begin(), language.end(), candidate) != language.end();
}

There is also ICU library, having ublock_getCode function. Note however that you can't tell the exact language, since same letters are used in different alphabets.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top