Question

I need to load an HTML template file (using std::ifstream), add some content and then save it as a complete web page. It would be simple enough if not for polish characters - I've tried all combinations of char/wchar_t, Unicode/Multi-Byte character set, iso-8859-2/utf-8, ANSI/utf-8 and none of them worked for me (always got some incorrectly displayed characters (or some of them not displayed at all).

I could paste a lot of code and files here but I'm not sure if that would even help. But maybe you could just tell me: what format/encoding should the template file have, what encoding should I declare in it for the web page and how should I load and save that file to get proper results?

(If my question is not specific enough or you do require code/file examples, let me know.)

Edit: I've tried the library suggested in the comment:

std::string fix_utf8_string(std::string const & str)
{
    std::string temp;
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
    return str;
}

Call:

fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");

Throws: utf8::not_enough_room - what am I doing wrong?

Was it helpful?

Solution

Not sure if that's the (perfect) way to go but the following solution worked for me!

I saved my HTML template file as ANSI (or at least that's what Notepad++ says) and changed every write-to-file-stream-operation:

file << std::string("some text with polish chars: ąśżźćńłóę");

to:

file << ToUtf8("some text with polish chars: ąśżźćńłóę");

where:

std::string ToUtf8(std::string ansiText)
{
    int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0);
    wchar_t * wideText = new wchar_t[ansiRequiredSize + 1];
    wideText[ansiRequiredSize] = NULL;
    MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize);
    int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL);
    char utf8Text[1024];
    utf8Text[utf8RequiredSize] = NULL;
    WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL);
    delete [] wideText;
    return utf8Text;
}

The basic idea is to use MultiByteToWideChar() and WideCharToMultiByte() functions to convert the string from ANSI (multi byte) to wide char and then from wide char to utf-8 (more here: http://www.chilkatsoft.com/p/p_348.asp). Best part is - I didn't have to change anything else (i.e. std::ofstream to std::wofstream or using any 3rd party library or changing the way I actually use the file stream (instead of converting strings to utf-8 which is necessary))!

Probably should work for other languages too, although I did not test that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top