Question

I want one function 'to lower' (from word) to work correctly on two languages, for example, english and russian. What should I do? Should I use std::wstring for it, or I can go along with std::string? Also I want it to be cross-platform and don't reinvent the wheel.

Was it helpful?

Solution

The canonical library for this kind of things is ICU:

http://site.icu-project.org/

There is also a boost wrapper:

http://www.boost.org/doc/libs/1_55_0/libs/locale/doc/html/index.html

See also this question: Is there an STL and UTF-8 friendly C++ Wrapper for ICU, or other powerful Unicode library

Make sure first that you understand the concept of locales, and that you have a firm grasp of what Unicode and more generally coding systems is all about.

Some good reads for a quick start:

http://joelonsoftware.com/articles/Unicode.html

http://en.wikipedia.org/wiki/Locale

OTHER TIPS

I think this solution is ok. I'm not sure it suits for every situation, but it's quite possible.

#include <locale>
#include <codecvt>
#include <string>

std::string toLowerCase (const std::string& word) {
    std::wstring_convert<std::codecvt_utf8<wchar_t> > conv;
    std::locale loc("en_US.UTF-8");
    std::wstring wword = conv.from_bytes(word);
    for (int i = 0; i < wword.length(); ++i) {
       wword[i] = std::tolower(word[i], loc);
    }
   return conv.to_bytes(wword);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top