The encoding used for storing the data outside the program is the only one that matters.
That data is likely to be used from other software. Someone will want to write those strings and they'll probably use some kind of specialised editor or gasp a general-purpose text editor. UTF-8 has much better support from other software than UTF-16, and that's what I would recommend and why.
Inside the program, what encoding you use doesn't matter, as long as you do it consistently and don't mix them up in stupid ways.
Obviously, if you use the same encoding inside the program as you do outside of it, you don't need to perform any conversions and the risk of mixing them up and producing mojibake is not there.
The thing with pugixml using wchar_t
is that the encoding it uses then depends on the size of wchar_t
. If the size is 2, it uses UTF-16; if the size is 4 it uses UTF-32. pugixml also has the option to use UTF-8 with char
by setting the PUGIXML_WCHAR_MODE
macro appropriately, so you can use that instead.
If you use wchar_t
API, stick to wstring
. Remember: since we're inside the program, it doesn't matter if it's going to be UTF-16 or UTF-32, as long as we're consistent. If you use the char
API, stick to string
. You could, I guess, perform conversions from wchar_t
to char16_t
and use u16string
s, but that wouldn't give much benefit.
The saving and loading functions in pugixml take an xml_encoding
parameter that lets you pick what encoding will be on the data outside the program, and that doesn't have to match what you use internally. Pick whichever you find the most convenient.